Closed DivineDominion closed 2 years ago
For context: after a fenced code block, the empty line is indeed tokenized as a BLOCK_EMPTY with a TEXT_NL token inside, so I guess the "smearing together" of indented code lines into a cohesive block across empty lines is being too greedy at the bottom edge
By "smearing together" I mean that these 3 lines become 1 cohesive block even though the middle line doesn't even have indentation characters (␣
denoting a space):
␣␣␣␣line 1 of code
␣␣␣␣line 3 of same block
I'm struggling to understand under what circumstances this would matter?
The token tree does not describe the given source text (that is "described" by text itself), it is an abstract representation of the semantic meaning of the text. There are circumstances where a BLOCK_EMPTY
has meaning (in the sense that it splits something else into two or more parts) and circumstances where it does not (an indented code block ends when there is a non-indented line, whether there is an intervening blank line or not.) This means that there are circumstances where one or more empty lines are assumed, and times where they are explicitly noted -- and this often has to do with technical limitations of the parser, and sometimes just because I happened to code it differently. This includes the fact that I don't always explicitly waste the CPU cycles to remove a token that is no longer needed and sometimes just mark it as an empty token and allow it to be freed later.
Either way, a BLOCK_EMPTY
token results in NULL output, so its presence or absence in the token tree doesn't really matter, since the parsing has already been completed at this point.
Perhaps you could explain why this came up?
Thanks for the explanation! I wanted to report this because it looked odd that the token tree's description had gaps: If the code block continued one line further down, I would have just thought that this is MMD6's way to interpret the situation. The gap looked suspicious and I didn't notice any mentioning of intentional gaps in the headers or source files, so I figured you might not be aware of this as well.
So this is a "wontfix because intentional" situation :)
Or maybe a "quasi-intentional" situation and will consider fixing if there is a situation where it actually matters.
:)
(But thank you for noticing!!)
Given this MMD:
The token tree description (with literal newlines replaced with
\n
to help readability) is:Note the bottom-most 3 lines:
The range
26:1
is essentially missing.The expected output for a token tree that describes the document correctly at character position
26
, I'd expect a BLOCK_EMPTY + TEXT_NL:@fletcher Would you agree that this is an issue worth tracking for the future?