eclipse-jdt / eclipse.jdt.core

Eclipse Public License 2.0
158 stars 125 forks source link

Formatter does not allow normal paragraph formatting in Javadoc comments #991

Closed garretwilson closed 1 year ago

garretwilson commented 1 year ago

_This this ticket is being migrated from Bug 471920, which was opened almost a decade ago._

2015-07-06 09:50:50 EDT

Try as I might, I can't get the formatter to allow me a simple, normal HTML paragraph

formatting inside Javadoc comments:

/**
 * <p>This is foo.</p>
 * <p>This is bar.</p>
 */
public void fooBar() {
...

Whatever I do, Eclipse still changes this to the following, which is a complete waste of space and looks ugly:

/**
 * <p>
 * This is foo.
 * </p>
 * <p>
 * This is bar.
 * </p>
 */
public void fooBar() {
...

This behavior has been present in Eclipse 4.4/Luna and many versions before that.

garretwilson commented 1 year ago

Dani Megert 2015-07-23 10:04:41 EDT

Is this a regression to 4.4?

garretwilson commented 1 year ago

Garret Wilson 2015-07-23 12:00:09 EDT

This behavior has always been like this. It's always been irritating, but I only now got around to filing a bug because I was already filing issues for the new 4.5/Mars formatting bugs.

garretwilson commented 1 year ago

Garret Wilson 2015-08-13 09:08:36 EDT

Can this be addressed now that Bug 471918 is being addressed? It makes my Javadocs look ugly and take up a lot of space.

garretwilson commented 1 year ago

Jay Arthanareeswaran 2015-08-13 11:31:25 EDT

May be Mateusz can look at once he is done with the barrage of bugs with the new formatter. We may not be back porting this one, though.

Tentatively targeting 4.6.

garretwilson commented 1 year ago

Mateusz Matela 2016-01-05 10:31:02 EST

It's very easy to move

to the category that doesn't have a line break added after the opening tag and before the closing tag (easy, but tedious - 169 failed tests to change). But what should we do about backward compatibility? Is it okay if lines get joined in old code that has

in separate lines? Or should we treat

as a special case for backward compatibility and not remove existing line breaks?

I think both solutions will have loud opponents, it's hard to say which group will be larger, though.

Jay, Dani, what do you think?

iloveeclipse commented 1 year ago

@garretwilson : surely you don't want to be perceived as spammer? Please don’t copy paste bugzille comments, the old bugzilla will stay read only.

garretwilson commented 1 year ago

surely you don't want to be perceived as spammer?

I don't know how people will perceive me. Maybe bots will perceive it as such, just like they tried to close the old issue. Any human actually reading this issue will see that I'm merely migrating the issue along with its comment history, as the entire ticket system should have been migrated to begin with. I'm just helping out.

Please don’t copy paste bugzille comments, the old bugzilla will stay read only.

As you wish.

iloveeclipse commented 1 year ago

If you really want move this bug forward, please create a good summary based on various comments, summarizing what was the pros and cons and technical issues etc.

No one except you will really try to re-read all the comments and do this work, simply because the problem isn't considered to be important enough and few remaining committers are working on other, more important (for them) issues.

garretwilson commented 1 year ago

If you really want move this bug forward, please create a good summary based on various comments, summarizing what was the pros and cons and technical issues etc.

The description provides a full, concise, and complete description of the problem.

… the problem isn't considered to be important enough and few remaining committers are working on other, more important (for them) issues.

I guess you said it all. The momentum seems to be swinging away from Eclipse to VS Code anyway. (I say this as a huge Eclipse proponent over several decades.) When the latest Eclipse YAML text editor uses formatting that is illegal for YAML (eclipse/wildwebdeveloper#1132, duplicate of eclipse/wildwebdeveloper#372) with no way to change the configuration, I suppose no one will fix the Java formatter to make the documentation HTML look decent.

Thanks for setting me straight in priorities. I'll not waste further energy on it. Have a good weekend.

jjohnstn commented 1 year ago

@garretwilson Have you considered just turning off formatting in Javadoc comments? It seems you have your Javadoc comments set up exactly the way you want and don't really require formatting. If you go to Preferences->Java->Code Style->Formatter and set your own format profile based off of Eclipse built-in, you can edit and go to the Comments section and turn off: Enable Javadoc formatting.

garretwilson commented 1 year ago

Have you considered just turning off formatting in Javadoc comments?

No, why would I want to do that?

I only said that I like this:

<p>Hello, World!</p>

better than this:

<p>
Hello, World!
</p>

How does this one single formatting preference somehow imply that that I have no need for automatic formatting of all the dozens if not hundreds of other things that can be formatted in Javadoc documentation?

It seems you have your Javadoc comments set up exactly the way you want …

I said nothing at all to imply that. I only pointed out that the existing formatter in one area is changing the HTML in a way that hardly anyone (and perhaps no one at all!) prefers.

garretwilson commented 1 year ago

@jjohnstn I'm curious—did you read the comments of the original ticket?

jjohnstn commented 1 year ago

@garretwilson I must of missed it in 471920. I'm certainly not arguing against adding an option to do formatting the way you want, I just wondered if there was anything the current formatting in Javadoc comments was doing for you and if turning it off might ease your frustration.

garretwilson commented 1 year ago

I must of missed it in 471920.

Actually I was mostly just asking out of curiosity. But I will note that in that ticket others stressed the need for this improvement, not just for personal formatting preference, but for team consistency and compliance with conventions:

I would like to argue for raising the priority of this bug: Google Java Style Guide, much appreciated in the Java world AFAICS, mandates the requested style (https://google.github.io/styleguide/javaguide.html#s7.1.2-javadoc-paragraphs), namely, <p>Blah instead of <p>\nBlah.

Please add an option in Eclipse formatter to keep

and the next word glued together.

As an illustration of a practical annoyance, the lack of support for this in Eclipse formatting makes it necessary to manually modify Checkstyle’s Google style config file (https://checkstyle.sourceforge.io/google_style.html) to prevent it from checking the <p> tags.

This may look like a detail, but configuring the formatter is crucial when team-working; the Google Java Style Guide is a nice, coherent, well-thought; de-facto standard set of rules; checkstyle is a nice plug-in supporting it out of the box; so the only ingredient missing is this configuration option in Eclipse.

I suppose this is very easy to implement for people who know how to do this already; and I really can’t see a serious argument against this proposal.

garretwilson commented 1 year ago

… adding an option to do formatting the way you want …

Are you saying that you prefer the formatting the other way? Do you @jjohnstn personally prefer:

<p>
stuff here
</p>

instead of

<p>stuff here</p>

I would be surprised if there are many people in the entire world who prefer it the way it is now. As I explained in a comment to the original ticket, I think it's probably implemented the way it is now, not because anybody prefers it like it is, but because of a semantic mistake when modeling the formatter:

I believe the mistake comes from thinking (as Mateusz mentioned in Comment 5) that some HTML elements should be categorized as "always has line break after opening tag". And that's an easy assumption to make because it seems natural, especially because we associate block elements with line breaks, and when we think of container elements such as <ul>. In fact I made this mistake myself years (and years and years) ago when I wrote my first XML/HTML formatting code.

… (I then explained a more correct conceptualization and algorithm that yield more appropriate results.) …

In summary, trying to categorize elements as whether they inherently require a newline after the opening tag is the wrong approach (and is not how browsers format the resulting inline/block elements via the CSS either), and it will wind up with awkward situations like this bug where a simple <p>foo</p> paragraph looks ugly and unnatural with unwanted line breaks. Instead, the logic should consider the children of each element in order to determine whether beginning and ending tags should be separated from the content by newlines.

To illustrate this more correct (and likely simpler) algorithm, I presented my own implementation of an HTML formatter. The code has since moved; it is now available here on GitHub at XMLSerializer.java. For the most part the BaseHtmlFormatProfile.java sets up the elements considered "block elements" and such. (The definitions of which elements are considered "block elements" and other categorizations are in HTML.java.) The important thing to note is that the algorithm doesn't say "add a newline after the opening of a block element".

I also noted that the output of my algorithm is in line with most other formatters:

Moreover the output of this formatting logic corresponds extremely closely to that of the JavaScript-based js-beautify, as the maintainer and I have discussed at beautify-web/js-beautify#1732 and in other tickets. (Not that js-beautify does not follow my simplified formatting algorithm; I note it only as a baseline to compare the results of my algorithm.)

So to summarize:

So why hasn't this been fixed in a decade-ish? @iloveeclipse already said it:

… the problem isn't considered to be important enough and few remaining committers are working on other, more important (for them) issues.

And there you go. Let's stop wasting energy on something that will never get done (at least not before Eclipse dies because everyone is using VS Code which, as I pointed out, doesn't have this problem). Let's close the ticket and move on with our lives.

jjohnstn commented 1 year ago

@garretwilson No, I don't think this issue should be closed. To be honest, I don't personally care what it does to my paragraph tags as I'm more concerned with how the Javadoc looks to the end-user so technically you are correct that I don't prefer it works the way it does :) Anyway, I'll take a look at what it would take to implement an option.

trancexpress commented 1 year ago

Actually I was mostly just asking out of curiosity. But I will note that in that ticket others stressed the need for this improvement, not just for personal formatting preference, but for team consistency and compliance with conventions:

Keep in mind that JDT is open source, anyone who feels strongly about JDT functionality can try to contribute a change. This is especially true for requests that have stood open for 8+ years. There is no guarantee the change will be merged (maintainer resources on JDT are strained, sometimes requests are not approved, etc.) but its something extra you could try when you really want a feature or bug fix.

jjohnstn commented 1 year ago

@garretwilson Ok, I have a working patch. So, it turns out the html formatting only occurs if Format HTML tags is selected in the formatter prefs for Javadocs so I added a new sub-preference for the paragraph tags:

image

For the following file:

image

with the option turned on I get:

image

At the moment, the option will be false by default.

garretwilson commented 1 year ago

First of all, I'm surprised (shocked!) that someone is working on this, and I'm really grateful to @jjohnstn for his efforts. Really this was unexpected.

I want to make sure we're on the same page about what is being asked. From the screen shots it looks like this new setting has to do with whether line breaks are added between paragraph blocks. But that has nothing to do with this ticket; instead, this ticket relates to whether line breaks are added inside paragraph blocks.

Starting from the screen shots above, let me try to "adjust" them to reflect what was intended. The option would be sort of like this. (You can choose a shorter sentence in order to keep the dialog from growing too wide. I use a verbose sentence here for clarity.)

[X] Format HTML tags [X] Add line break after paragraph opening tag, and another line break before paragraph closing tag.

The line in bold is the new setting. It would be enabled by default, to allow opting in to the new functionality by turning off this new setting.

Let me show you how it would work. Let's assume that we start out with this code:

/**
 * <p>a line</p> <p>another line</p>
 */

Here is the current behavior (and with the hypothetical new setting on):

/**
 * <p>
 * a line
 * </p>
 * <p>
 * another line
 * </p>
 */

You see how ugly that is? In fact, even in your own example you didn't make the formatting this ugly anywhere. (Even though you claim you have no preference, I'll still bet a beer that in real life you never format paragraphs ugly like this.)

Here is the behavior we've been requesting (by turning off the hypothetical new setting I illustrated above):

/**
 * <p>a line</p>
 * <p>another line</p>
 */

Yes. That's all. Just normal formatting like everyone would normally do. (Is it starting to make sense?)

garretwilson commented 1 year ago

I want to add some extra details and explanation, but I'm adding it here as a separate comment. If you find this comment too confusing, just ignore it and make the one-off implementation for paragraph tags as I explained in my previous comment.

But if you want to do this all the right way you won't make it specific to paragraphs (<p>) but to any HTML element that is traditionally styled with default block CSS formatting. (You can look them up in the HTML/CSS specs, or you can simply look at the code snippets from my XML/HTML formatter which I referenced in my previous comments, as I've already spent hours looking them up myself from the original specifications.)

As I mentioned, this formatting problem likely stems from a semantic error in understanding how block formatting works. The formatter thinks that block elements should automatically have a newline after the opening tag, and another one before the closing tag. That's why the formatter also does this ugly formatting (also shown in your example above):

<pre>
<code>
public class FooBar {
  …
}
</code>
</pre>

But no, <pre> is a block element, and <code> is an inline element. There is absolutely no reason for that formatting. It's the same semantic mistake. If you fix this across the board for block elements, then the correct formatting would look like this:

<pre><code>
public class FooBar {
  …
}
</code></pre>

Sure, I'll be super happy with fixing this for <p>, but in a perfect world you won't just do a one-off special case for <p>, you'll fix this for all block elements. The setting might be:

[X] Format HTML tags [X] Add line break after block element opening tag, and another line break before block element closing tag.

The truly correct algorithm, as I summarized in the original ticket, determines whether a block element has newlines inside it is based upon the children of the block element. For example, if a block element contains other block elements, there will be newlines inside the block element—but the newlines will be added in order to separate the children, not because a block element always has newlines inside it. For example:

<div>
  <p>line one</p>
  <p>line two</p>
</div>

Here the <div> (a block element) has newlines inside it but they are because the <p> elements are block elements too and they need newlines around them. In other words, the rule is "put newlines around block elements", not "put newlines inside block elements", and here, with block elements inside block elements, the newlines just happened to be after the <div> opening tag and before the </div> closing tag. If there were no block elements inside the <div>, then this would just be formatted:

<div>just some text here</div>

Nonetheless if you skip the extra details in this comment and simply provide a way to remove the ugly newlines after the <p> opening tag and before the </p> closing tag, that will be a huge improvement even though that's a one-off, special-case implementation.

garretwilson commented 1 year ago

But no, <pre> is a block element, and <code> is an inline element. There is absolutely no reason for that formatting.

This might not have been the best example, because <pre> is a special element that says there should be no formatting at all inside it, so the formatter is wrong on various points to introduce a newline between <pre> and <code>. And I understand that formatting Java code inside <pre> blocks in Javadoc comments gets complicated, and I won't want to go there. So everything I said above is correct, just realize that there are other issues surrounding <pre> formatting; I don't want to get sidetracked by those.

jjohnstn commented 1 year ago

Hi @garretwilson So you want this to be all block element tags. That could be done in a similar manner to how I did it for

. The option should be false by default as I have done (e.g. Don't put block elements on separate lines) as this translates better for options not specified in an existing configuration (e.g. missing or not specified = false). It would help if you listed the expected block tags you expect to be handled here in this issue as it will definitely meet your expectations and will make it clearer to anyone following this issue.

garretwilson commented 1 year ago

So you want this to be all block element tags.

Sincerely I don't want to confuse things. You might want to just get it working for <p> tags first just to make sure we're on the same page.

It would help if you listed the expected block tags you expect to be handled here in this issue as it will definitely meet your expectations and will make it clearer to anyone following this issue.

If you really want to know (these are all in the code I referenced above), they include all of the following. (I'm going to just include the source code, because at the moment I'm the middle of a large writing task, and this would break my train of thought as well take a lot of grunt work for formatting.) The code API Javadocs documentation also includes links to the specifications so you can check my work, as well as show you where I go the information for reference.

For example ELEMENT_LI is a constant for the <li> element local name. You can ignore the things about namespaces, as I assume it doesn't apply here. My code works with the a W3C compliant DOM.

block elements themselves

    /**
     * Elements HTML5 and CSS 2.1 suggests should be rendered as CSS <code>display: block</code>.
     * @apiNote The majority of these elements follow the HTML 5 specification, but they also include obsolete elements not mentioned as block elements but
     *          indicated to be block elements by CSS 2.1 for HTML 4.
     * @see <a href="https://www.w3.org/TR/html52/rendering.html#the-page">HTML 5.2 § 10.3.2. The page</a>
     * @see <a href="https://www.w3.org/TR/html52/rendering.html#non-replaced-elements-flow-content">HTML 5.2 § 10.3.3. Flow content</a>
     * @see <a href="https://www.w3.org/TR/html52/rendering.html#sections-and-headings">HTML 5.2 § 10.3.7. Sections and headings</a>
     * @see <a href="https://www.w3.org/TR/html52/rendering.html#section-lists">HTML 5.2 § 10.3.8. Lists</a>
     * @see <a href="https://www.w3.org/TR/html52/rendering.html#the-fieldset-and-legend-elements">HTML 5.2 § 10.3.13. The fieldset and legend elements</a>
     * @see <a href="https://www.w3.org/TR/CSS2/sample.html">CSS 2.1 Appendix D. Default style sheet for HTML 4</a>
     */
    public static final Set<NsName> BLOCK_ELEMENTS = Stream.of(
            //the page
            ELEMENT_HTML, ELEMENT_BODY,
            //flow content
            ELEMENT_ADDRESS, ELEMENT_BLOCKQUOTE, ELEMENT_CENTER, ELEMENT_DIV, ELEMENT_FIGURE, ELEMENT_FIGCAPTION, ELEMENT_FOOTER, ELEMENT_FORM, ELEMENT_HEADER,
            ELEMENT_HR, ELEMENT_LEGEND, ELEMENT_LISTING, ELEMENT_MAIN, ELEMENT_P, ELEMENT_PLAINTEXT, ELEMENT_PRE, ELEMENT_XMP,
            //sections and headings
            ELEMENT_ARTICLE, ELEMENT_ASIDE, ELEMENT_H1, ELEMENT_H2, ELEMENT_H3, ELEMENT_H4, ELEMENT_H5, ELEMENT_H6, ELEMENT_HGROUP, ELEMENT_NAV, ELEMENT_SECTION,
            //lists
            ELEMENT_DIR, ELEMENT_DD, ELEMENT_DL, ELEMENT_DT, ELEMENT_OL, ELEMENT_UL,
            //fieldset and legend
            ELEMENT_FIELDSET,
            //obsolete elements
            ELEMENT_FRAME, ELEMENT_FRAMESET, ELEMENT_NOFRAMES, ELEMENT_CENTER, ELEMENT_MENU).map(elementName -> NsName.of(XHTML_NAMESPACE_URI_STRING, elementName))
            .collect(toUnmodifiableSet());

list elements

    /**
     * Elements HTML5 suggests should be rendered as CSS <code>display: list-item</code>.
     * @see <a href="https://www.w3.org/TR/html52/rendering.html#section-lists">HTML 5.2 § 10.3.8. Lists</a>
     * @see <a href="https://www.w3.org/TR/html52/rendering.html#the-details-element-rendering">HTML 5.2 § 10.5.3. The details and summary elements</a>
     */
    public static final Set<NsName> LIST_ITEM_ELEMENTS = Stream.of(
            //lists
            ELEMENT_LI,
            //details and summary
            ELEMENT_SUMMARY).map(elementName -> NsName.of(XHTML_NAMESPACE_URI_STRING, elementName)).collect(toUnmodifiableSet());

metadata content

    /**
     * Elements HTML5 considers <dfn>metadata content</dfn>.
     * @see <a href="https://www.w3.org/TR/html52/dom.html#metadata-content">HTML 5.2 § 3.2.4.2.1. Metadata content</a>
     */
    public static final Set<NsName> METADATA_CONTENT = Stream
            .of(ELEMENT_BASE, ELEMENT_LINK, ELEMENT_META, ELEMENT_NOSCRIPT, ELEMENT_SCRIPT, ELEMENT_STYLE, ELEMENT_TEMPLATE, ELEMENT_TITLE)
            .map(elementName -> NsName.of(XHTML_NAMESPACE_URI_STRING, elementName)).collect(toUnmodifiableSet());

other

And finally <head>.

I copied quickly, but I think I got them all. Not all of them apply, but I don't have time at the moment to weed them out. I'm sure you know which are relevant to you.

But just start with <p> and I'll be happy.

jjohnstn commented 1 year ago

Ok, here is the changed code:

    String formatCodeTags = "(pre)"; //$NON-NLS-1$
    String separateLineTags = "(nl|table|tr)"; //$NON-NLS-1$
    String breakBeforeTags = "(dd|dt|li|td|th|h1|h2|h3|h4|h5|h6|q)"; //$NON-NLS-1$
    String blockTags = "(p|dl|ul|ol|hr|dir)"; //$NON-NLS-1$
    String breakAfterTags = "(br)"; //$NON-NLS-1$
    String noFormatTags = "(code|tt)"; //$NON-NLS-1$
    String otherTags = "([\\S&&[^<>]]++)"; //$NON-NLS-1$

Elements in separateLineTags get put on a separate line. Elements in breakBeforeTags are already how you want them - they get a newline before but not after. Items I have put in blockTags were in separateLineTags but now are optionally treated as separateLineTags or breakBeforeTags based on the new option (I just added div which wasn't there before). I also have optional separate vs break-before behavior added to the method that handles the pre tag. All other tags are in otherTags and nothing is done with them.

jjohnstn commented 1 year ago

I'll post a patch shortly as I can modify if you have any other issues.

jjohnstn commented 1 year ago

Hi @garretwilson Everything is merged. It won't appear in the 4.28-I-builds until after 7:30 EDT tomorrow evening.

garretwilson commented 1 year ago

@jjohnstn it's amazing that after so much time someone went to work on this so quickly. Thank you so much. I'm not sure when I'll get to test this, but at the very latest of course I'll try it out at the next general release.

jjohnstn commented 1 year ago

Closing this as completed via #1002 @garretwilson If you find any problems in your testing, please open a new issue

garretwilson commented 1 year ago

I installed Eclipse EE 2023-06 RC1, and after limited testing this seems to be working!

🥳

It's so amazing after so much time that suddenly it formats correctly! I'm still in shock.

Thank you so much for attending to this @jjohnstn .

garretwilson commented 1 year ago

Uh, oh. I'm afraid this introduced another bug. 😢 There's still time to fix this before Eclipse 2023-06 is released. Should I file another ticket, or do you want to re-open this bug?

Unfortunately the fix just blindly says "don't add a newline before the end tag of a block element". But that's not the rule. The error here is that the logic is concentrating on what should go inside block elements. In other words, instead of saying "should <p> have a line break inside it?" the algorithm should ask "should this element have a newline before it?" That's why I keep referring to my original comment about the mistake in understanding the algorithm (the same mistake I originally made).

Here's what's happening now:

<p>For example for the file <code>path/to/example.foo.bar</code> the following would be returned in order:</p>
<ol>
<li><code>foo.bar</code></li>
<li><code>bar</code></li></ol>

Note that the ending </ol> should be on a new line, like this:

<p>For example for the file <code>path/to/example.foo.bar</code> the following would be returned in order:</p>
<ol>
<li><code>foo.bar</code></li>
<li><code>bar</code></li>
</ol>

This is not difficult; the problem is with the algorithm asking the wrong questions. The current algorithm says, "Oh, I have a <p> tag. Should I put a newline after it?" It also says, "Oh, I have a </p> tag. Should I put a newline before it?"

But that's not the right question to ask. A block element indicates that newlines should be before and after it, not inside it. Here is what the algorithm should be asking:

  1. I encountered a <p> opening tag. Should I have a a newline before it? It is a block element, so yes.
  2. I encountered a "For example …" text run. Should I have a newline before it? It is not a block element, so no.
  3. I encountered a </p> closing tag. Should I have a newline after it? It is a block element, so yes.
  4. I encountered a <ol> opening tag. Should I have a newline before it? It is a block element, so yes.
  5. I encountered a <li> opening tag. Should I have a newline before it? It is a block element, so yes.
  6. I encountered a <code> opening tag. Should I have a newline before it? It is not a block element, so no.
  7. I encountered a </li> closing tag. Should I add a newline after it? It is a block element, so yes.
  8. I encountered a </ol> closing tag. Should I add a newline after it? It is a block element, so yes.

Note that I never ask if I should add a newline after an opening tag, or if I should add a newline before a closing tag. If you instead check for adding newlines outside instead of inside block elements, then nested block elements take care of themselves.

(Note that you can get fancy and "upgrade" an inline element to a block element if the children are mixed, that is both inline and block elements, like text children intermingled with <div>. And that's what a browser does, implicitly wrapping such things in block flow elements. But I don't want to complicate things.)

In any case I can't live with <li><code>bar</code></li></ol> and I'll have to turn off this fix and stop using it for now. I'd rather things be at least consistent.

@jjohnstn can we reopen this ticket, or should I file a new ticket?

jjohnstn commented 1 year ago

@garretwilson Please open a new issue. It is way too late for 2023-06 which has already produced RC2 for JDT.