Empty line after indented code block not recognized as BLOCK_EMPTY, not its newline as TEXT_NL

DivineDominion commented 2 years ago

Given this MMD:

text

    code1
    code2

more

The token tree description (with literal newlines replaced with \n to help readability) is:

=====>
* (0) 0:31  'text\n\n    code1\n    code1\n\nmore'
    * (77) 0:5  'text\n'
        * (223) 0:4 'text'
        * (218) 4:1 '\n'
    * (60) 5:1  '\n'
        * (218) 5:1 '\n'
    * (52) 6:20 '    code1\n    code1\n'
        * (223) 10:5    'code1'
        * (218) 15:1    '\n'
        * (223) 20:5    'code2'
        * (218) 25:1    '\n'
    * (77) 27:4 'more'
        * (223) 27:4    'more'
<=====

Note the bottom-most 3 lines:

        * (218) 25:1    '\n'
    * (77) 27:4 'more'
        * (223) 27:4    'more'

The range 26:1 is essentially missing.

The expected output for a token tree that describes the document correctly at character position 26, I'd expect a BLOCK_EMPTY + TEXT_NL:

=====>
* (0) 0:31  'text\n\n    code1\n    code1\n\nmore'
    * (77) 0:5  'text\n'
        * (223) 0:4 'text'
        * (218) 4:1 '\n'
    * (60) 5:1  '\n'
        * (218) 5:1 '\n'
    * (52) 6:20 '    code1\n    code1\n'
        * (223) 10:5    'code1'
        * (218) 15:1    '\n'
        * (223) 20:5    'code2'
        * (218) 25:1    '\n'
    * (60) 26:1 '\n'                                     <----------
        * (218) 26:1    '\n'                             <----------
    * (77) 27:4 'more'
        * (223) 27:4    'more'
<=====

@fletcher Would you agree that this is an issue worth tracking for the future?

DivineDominion commented 2 years ago

For context: after a fenced code block, the empty line is indeed tokenized as a BLOCK_EMPTY with a TEXT_NL token inside, so I guess the "smearing together" of indented code lines into a cohesive block across empty lines is being too greedy at the bottom edge

By "smearing together" I mean that these 3 lines become 1 cohesive block even though the middle line doesn't even have indentation characters (␣ denoting a space):

␣␣␣␣line 1 of code

␣␣␣␣line 3 of same block

fletcher commented 2 years ago

I'm struggling to understand under what circumstances this would matter?

The token tree does not describe the given source text (that is "described" by text itself), it is an abstract representation of the semantic meaning of the text. There are circumstances where a BLOCK_EMPTY has meaning (in the sense that it splits something else into two or more parts) and circumstances where it does not (an indented code block ends when there is a non-indented line, whether there is an intervening blank line or not.) This means that there are circumstances where one or more empty lines are assumed, and times where they are explicitly noted -- and this often has to do with technical limitations of the parser, and sometimes just because I happened to code it differently. This includes the fact that I don't always explicitly waste the CPU cycles to remove a token that is no longer needed and sometimes just mark it as an empty token and allow it to be freed later.

Either way, a BLOCK_EMPTY token results in NULL output, so its presence or absence in the token tree doesn't really matter, since the parsing has already been completed at this point.

Perhaps you could explain why this came up?

DivineDominion commented 2 years ago

Thanks for the explanation! I wanted to report this because it looked odd that the token tree's description had gaps: If the code block continued one line further down, I would have just thought that this is MMD6's way to interpret the situation. The gap looked suspicious and I didn't notice any mentioning of intentional gaps in the headers or source files, so I figured you might not be aware of this as well.

So this is a "wontfix because intentional" situation :)

fletcher commented 2 years ago

Or maybe a "quasi-intentional" situation and will consider fixing if there is a situation where it actually matters.

:)

fletcher commented 2 years ago

(But thank you for noticing!!)

fletcher / MultiMarkdown-6

Empty line after indented code block not recognized as BLOCK_EMPTY, not its newline as TEXT_NL #232