[23] markdown javadoc should detect code regions

stephan-herrmann commented 2 months ago

Follow-up after #2738:

In markdown two ways exist to mark a block of code:

indentation
"fenced" within a pair of ```

Parsers should recognize these regions and in particular avoid parsing contained @ prefixed words as tags.

See the second example in https://openjdk.org/jeps/467#JavaDoc-tags

stephan-herrmann commented 2 months ago

In order to correctly detect indented blocks, we need to keep track of relative amount of whitespace between /// and text.

According to https://openjdk.org/jeps/467#Syntactical-details individual lines of a markdown block are normalized to start at the start column of the least indented line.

According to those rules the following comment should start with a code region:

    ///     void m() {}
    ///
    /// Plain text

The javadoc tool, however, seems to have problems with code blocks at the beginning or end of a comment, the above is rendered as plain text (in contrast to github rendering). To achieve compatibility it might suffice if we detect increase of indentation without worrying about subsequent lines with less leading whitespace.

stephan-herrmann commented 2 months ago

The javadoc tool, however, seems to have problems with code blocks at the beginning or end of a comment, the above is rendered as plain text (in contrast to github rendering).

This was not quite correct, that example is rendered correctly by javadoc. Apparent bugs are:

javadoc doesn't accept a Java annotation (like @Override) as the first thing within a markdown comment, despite correct indentation
javadoc silently drops a trailing indented region

stephan-herrmann commented 2 months ago

I have local changes that ensure that significant whitespace is actually part of the DOM TextElements. With this and with specific parsing of ``` both kinds of code blocks will be parsed without interpreting @tags and with proper rendering on the markdown side.

Also locally fixed: detecting more than 3 /, to include any excess slashes in TextElements, too.

To somewhat decouple these changes from standard AbstractCommentParser, I created a little companion class that keeps track of markdown related state.

@jarthana fyi. With these local changes all new tests in https://github.com/eclipse-jdt/eclipse.jdt.ui/pull/1587 pass :)

stephan-herrmann commented 2 months ago

@jarthana another fyi: while preparing a PR for this issue, I saw my changes causing a few regressions in (disabled) ASTConverterMarkdownTest. While trying to avoid such regressions I'm making a few obvious changes in those tests. So for your next round on those tests you may want to wait just a little while, OK?

stephan-herrmann commented 2 months ago

For posterity:

Initially I planned to do all the markdown specific handling of whitespace and leading / at the DOM level, but then I realized that already the parser must analyse leading whitespace to distinguish verbatim code from tags:

/// Start
///
///    @param i this is a param tag
void m1(int i) {}

/// Start
///
///    @param // this is a Java annotation
///
///This line determines that 4 leading spaces suffice for a code block
void m2() {}

In both comments, @param starts after 4 blanks, but the comment of m1() has a minimal indentation of 1 space, so only 3 of those 4 blanks are significant, hence we don't have a code block. The last line in the comment of m2() determines that this comment has minimal indentation of 0 spaces, so all 4 blanks before @param are significant and thus signal a code block.

That last line could also consist of only four slashes (////) to the same effect, because the fourth of those slashes will be treated as the beginning of the comment text.

Without such whitespace analysis the parser could not know whether or not to parse a tag.

eclipse-jdt / eclipse.jdt.core

[23] markdown javadoc should detect code regions #2808

For posterity: