Possibly undesired behavior uncovered by the spec

elibarzilay commented 8 months ago

I ran into a particular edge case that looks like a bug:

>+
>  foo

Renders as:

+ foo

with foo being in its own <p>.

Adding one space works as expected:

>+
>   foo

Renders as:

+ foo

And also dropping the > blockquote makes it work as expected:

+
  foo

Renders as:

+ foo

The last one in particular makes me think that this is undesired behavior.

I looked through the spec but didn't find anything about this being intentional. I ran into it when using markdown-it, and in there it seems that changing the 1 to 0 in the following:

if (contentStart >= max) {
  // trimming space in "-    \n  3" case, indent is 1 here
  indentAfterMarker = 1
} else {
  indentAfterMarker = offset - initial
}

fixes this. Running the tests from here on the resulting code shows no failures, which implies that there's no other tested cases that depend on the original behavior.

(I tried to look through the commonmark.js code (parseListMarker), to see if it can fixed in a similar way, but it's very different.)

jgm commented 8 months ago

Possibly related to (or the same as) #421 and #460 ?

elibarzilay commented 8 months ago

Possibly. In any case, it's not great that it's not in the spec in some way (also, I agree with sentiment in the other ones that adding a > prefix (with or without space) would be best behaved if it adds a blockquote tag around whatever's in it without that prefix...).

wooorm commented 7 months ago

What isn’t in the spec? What did you expect in the spec?

Your first example is explained because block quotes take 0 or 1 space after the >. So after the block quote, the input:

>+
>  foo

is sort of “equivalent” to:

+
 foo

leaving one space on the 1nd line, which is not enough for that line to be included in the list item. Both the “block quote space”, and the “how many spaces for list item content” are explained.

elibarzilay commented 7 months ago

I meant the spec as in what's in this repo --- there's no example (or test) for this behavior that I found. Possibly because this was something that could change.
IMO, having a > block grab a space on lines it can is unpredictable + confusing. I'd much rather see it grab a space or not throughout the whole block. (I think that this better fits the "writer's perspective" in @jgm's comment).
An example of this would be what goes through my head when I want to block-quote some text: I don't want to be aware of the parser. (Unlike some tool that might transform or construct md text, where it should be aware of such things, and add > instead of just >.) See below for a demonstration of this.
I semi-hope that the lack of tests for the current behavior are because it's questionable and might change.

Here's the demonstration of how the parser makes an unexpected visit into the writer's head is:

>```
>2. item two
>   - sublist
>```

which renreds as:

2. item two
  - sublist

The brokeness is that you need to be a "markdown expert" to know what's going on, and (again, IMO) the most important premise of markdown is that you don't need such experts.

BTW, the above is "kind of" in the spec (eg, this example), but since it uses a plain paragraph, the implications are not visible enough.

(And as another sidenote, this is even messier in its surprising result:

   >```
  > # Foo
   >bar
 >  baz
>```
> meh

)

jgm commented 7 months ago

I agree that something should change here; see discussion in the blockquote issues linked above.

wooorm commented 7 months ago

I meant the spec as in what's in this repo --- there's no example (or test) for this behavior that I found. Possibly because this was something that could change.

I semi-hope that the lack of tests for the current behavior are because it's questionable and might change.

Because it’s 2 things combined. The first part (one optional space) is described in the first paragraph of block quotes. The second is starting list items with a blank line, which is explained in case 3 of list items (right after example 277).

There are tons of things explained in the spec. To account for every combination of things is unmanagable.

IMO, having a > block grab a space on lines it can is unpredictable + confusing. I'd much rather see it grab a space or not throughout the whole block. (I think that this better fits the "writer's perspective" in @jgm's comment).

I disagree. All markdown parsers work like this so it’s pretty predicatable. Many things in markdown are optional and allowed to be messy. And this one is explained right in the first line in the spec.

Your proposal breaks for blank lines:

> Some text
>
> Some more text

which also happens a lot in code:

> ```js
> function something(value) {
>   const double = value * 2
>
>   return double
> }
> ```

If you account for blank lines, it would still break when people are a bit messy. And in my experience many markdown writers are a bit messy. Take this diff for the previous code:

 > ```js
 > function something(value) {
 >   const double = value * 2
 >
 >   return double
-> }
+>}
 > ```

you are proposing that users accidentally forgetting a space on one line, get a different result for many lines. I use code as an example because those spaces are visual. I think for lists and indented code it gets more complex.

An example of this would be what goes through my head when I want to block-quote some text: I don't want to be aware of the parser. (Unlike some tool that might transform or construct md text, where it should be aware of such things, and add > instead of just >.) See below for a demonstration of this.

The brokeness is that you need to be a "markdown expert" to know what's going on, and (again, IMO) the most important premise of markdown is that you don't need such experts.

Nobody wants to be aware of the parser but having rules in markdown doesn’t mean that suddenly people need to “know the parser”. We can have some rules. You are also suggesting another rule. Folks are messy. They typically write that space but sometimes they forget. They shouldn’t “know the parser” that now they’re punished for that and the whole thing renders differently? It sounds like you want to write > without a space? Why? That seems like an uncommon style to me?

(And as another sidenote, this is even messier in its surprising result:

When you write messy markdown, reglardless of this > and space, it’s always going to be surprising. There’s thousands of gotchas in markdown, rules where it’s trying its best for messy markdown but there are limitations to that. I think your proposal makes properly styled markdown (> and a space), when someone makes one little mistake, more surprising?

I agree that something should change here;

What do you think should happen? Glossing over the linked discussion, it seems to be more about improving the words in the spec in 2017?

jgm commented 7 months ago

I'm not sure what should happen (or it would have happened already).

But I do think it's bad that the spec makes it ambiguous what certain strings represent.

Note how the current reference parser interprets the spec in the cases noted in #421:

>1. > asdf
>   > sdfg

blockquote [list [blockquote], blockquote]

vs

> 1. > asdf
>    > sdfg

blockquote [list [blockquote]]

These shouldn't have different meanings, by the "principle of uniformity."

Maybe that's just a problem with the parser -- but the fact is, the spec doesn't unambiguously say how this is to be treated, for reasons explained in #460.

Requiring that the "block quote marker" either include a space or not include a space on every line of the block quote seems a step in the right direction. This runs into the problem you noted with blank lines, but we could easily say something like "include a space unless followed by blank line." I'm less concerned about the second problem you note (the writer who just accidentally omits the space on one line). I'd rather have an unambiguous spec than a forgiving parser, I suppose.

elibarzilay commented 7 months ago

(@jgm)

I'd rather have an unambiguous spec than a forgiving parser, I suppose.

Yes, that how I see it. In terms of what humans do, the current state can lead to a surprise if you add just a > without a space on multiple lines. Automated md-generating tools shouldn't do that, but humans are probably likely to do a multi-line cursor thing, hit > and be done with it. The reason that I don't consider the one-new-line-without-a-space messing things up is my estimate that this is less likely to happen.

(@wooorm)

I meant the spec as in what's in this repo [...]

Because it’s 2 things combined.

Please re-read the original description: I took a parser that passes all of the tests, changed it so that in the case I started with I get a more reasonable behavior, re-ran the tests and they passed again. This means that the change I did was not tested, it means nothing more or less than that. It's a very simple technical point, so I don't see why you're confrontational about it. The subjective point of how commonmark is specified or how the reference implementation(s) behave is a separate point.

Yes, existing parsers tend to follow the reference. The question is whether the reference should be refined. And yes, authors are messy: you show a (valid) example of an unintended result when you have one line without a space as a result of messy writing, I showed another example that is unintended because of leaving out a space on the first line. So they're both examples of having no space on one line affecting many lines. I agree with @jgm's sentiment that the grab-a-space-if-you-can is more of a forgiving parser thing, which makes things (more) ambiguous, and I prefer a more deterministic behavior overall.

It's all a game of tradeoffs, and the question is which one is less confusing overall, or rather which editing mistake is more common -- tweaking a line and forgetting the space, or adding a > without a space for a whole existing block. If you think that the latter is less likely, then that's fine --- for me, it seems more likely.

wooorm commented 7 months ago

This means that the change I did was not tested, it means nothing more or less than that.

There are many things untested indeed, something I would love to see change, but I believe @jgm prefers to leave things up for interpretation.

You want to change the spec. I do not think that should be done.

Yes, existing parsers tend to follow the reference.

You misrepresent. The space after > is optional as explained in the 1st line of block quotes. The spec is very clear with what is happening. You want to change that rule.

It's all a game of tradeoffs, [...] If you think that the latter is less likely, then that's fine --- for me, it seems more likely.

There are many markdown records that exist already that you want to break.

elibarzilay commented 7 months ago

What I wanted, is a clarification of what some unspecified corner case should be. Like I said:

it's not great that it's not in the spec in some way

which is a preference to making it more specified.
@jgm raised the point that this case is not clear, so I contributed my opinion about a possible change. I'm sure that he is aware of the possible breakage, and I participated as someone who ran into it.
You're very clearly on the side of changing no code, and you also argued that no cases should be added to the spec/tests. So you argue for no changes at all. That's a valid point of view, which you could have started with instead of dragging on with this confrontational style to the point of this accusation that I want to break stuff.

Please reconsider your interactions. You could start by resisting the urge to reply to every word I just write.

I'll make it easier for you by shutting up.

commonmark / commonmark-spec