gamburg / margin

Lightweight markup designed for an open mind
https://margin.love
MIT License
190 stars 9 forks source link

Ambiguity around annotation scopes #13

Closed vlmutolo closed 4 years ago

vlmutolo commented 4 years ago

Consider the following line.

[a[] b []c]

Is this one annotation with the contents a[] b []c? Or is it an item b with two annotations a[ and ]c? This ambiguity results from allowing square brackets inside annotations.

Another example:

[a [b]]

Is this

  1. an annotation with the contents a [b], or
  2. an item with contents ] and an annotation a [b?

In other words, which square bracket closes an annotation? The first one found, or the last one?

vlmutolo commented 4 years ago

I would argue that, for the simplicity of parsing, the first non-escaped ] should be considered the end of the annotation. This would keep the functionality of the following example from the website.

[I belong to Item B] Item B [I also belong to Item B]

However, it would break the functionality of this example.

[This entire sentence [including this text] represents a single annotation]

It would instead be parsed as an item with contents represents a single annotation] and an annotation with contents This entire sentence [including this text.

It seems like these examples are at odds with each other, and I would rather the former work than the latter, and the former is probably easier to parse.

vlmutolo commented 4 years ago

Actually, I guess there's another option. We could parse the square brackets as though we allowed nesting, and then just use all the text in the "outermost" scope as the text of the annotation. Essentially, we would start with an annotation_nest_level of 0, increment it when we find [, and decrement it when we find ]. All the text that occurs while annotation_nest_level is nonzero is considered part of the annotation.

Here's a worked example, with the second row representing the annotation_nest_level.

this is [a [nested] annotation] and some text 
000000001112222222111111111111000000000000000

But we would only use the "nesting" for parsing. The actual annotation is a [nested] annotation.

This method would preserve both of the examples I referenced above. There's one downside I can see, though. Consider the following example.

[an [annotation] that goes on forever

This could reasonably either be parsed as

  1. an item with value [an that goes on forever] or
  2. a single annotation with value an [annotation] that goes on forever. The "nested" parsing method would produce either an error because of an "unclosed annotation", or we could possibly define it to produce (2). I would prefer the error, though I understand why, in the spirit of flexibility and ease of use for non-technical people, the (2) interpretation might be better overall.

Though, we could probably specify that, should the annotation_nest_level be nonzero at the end of a line, the last closing bracket should be used as the end of the annotation. This could make the parser too complex, however.

There's another, related edge case to consider when the user inputs more closing brackets than opening.

] an [annotation that] starts weird 

I think this is most neatly solved by never letting the annotation_nest_level drop below zero. In Rust lingo, use a saturating_sub with unsigned integers instead of letting the value drop to the negatives. This way, the above example would be parsed as an item with value ] an starts weird with an annotation of value annotation that. I think that's the best we can do for this case.

Overall, I'm in favor of this pseudo-nested annotation approach.

gamburg commented 4 years ago

Love it. This pseudo-nested approach definitely seems more thinker-friendly.

The "nested" parsing method would produce either an error because of an "unclosed annotation", or we could possibly define it to produce (2). I would prefer the error, though I understand why, in the spirit of flexibility and ease of use for non-technical people, the (2) interpretation might be better overall.

Agreed. I think (2) is more in the spirit of Margin. When there's a question between adding mental load to the thinker or the parser, we should probably default to making things more difficult for the parser.

I think this is most neatly solved by never letting the annotation_nest_level drop below zero.

This seems like the correct solution.

All good stuff. Thank you for really thinking through this in a way that I did not.

@vlmutolo Would you like me to mark this issue as closed, since these conclusions conform to the current spec as is? Or would you like this issue to persist until this is fixed in Margin's example parser? I'm amenable either way, but maybe there should be a separate way to organize issues specific to getting the example parser to spec, since there is so much ground to be covered on that front. What do you think?

vlmutolo commented 4 years ago

Iโ€™d consider this issue closed. And yeah, there should be a better way to organize all these issues related to the spec and they should probably be separate from the actual implementation in the parser.

vlmutolo commented 4 years ago

Maybe you could create an issue label โ€œspecโ€ for issues that discuss the margin specification.

mtsknn commented 4 years ago

I had been thinking for a few days of opening a similar issue, but looks like @vlmutolo was a week ahead of me. ๐Ÿ‘๐Ÿ˜„

I would have proposed like @vlmutolo in his second comment that "for the simplicity of parsing, the first non-escaped ] should be considered the end of the annotation."

This way it would be easier to parse nested annotations for humans as well. Consider these two:

[This entire sentence [including [this] text [and this too]] represents a single annotation]

[This entire sentence [including [this\] text [and this too\]\] represents a single annotation]

In the first one, the user needs to count the opening and closing brackets. In the second one, it's easier to spot the closing bracket that actually ends the annotation.

(Annotations with such many levels of nesting are maybe not very likely to occur in the wild, but I think it's good to think about these as well.)

This example by @vlmutolo would also be unambiguous:

[an [annotation] that goes on forever

The annotation's value would be "an [annotation" and the item's value would be " that goes on forever"

Similarly:

an [annotation] that goes] on forever

an [annotation\] that goes] on forever
  1. Annotation: "annotation"; item: "an that goes] on forever"
  2. Annotation: "annotation] that goes"; item: "an on forever"
mtsknn commented 4 years ago

Actually, having written a parser (mtsknn/margin-parser), I think that the last option proposed by @vlmutolo is fine after all. ๐Ÿ™‚

vlmutolo commented 4 years ago

Yeah itโ€™s probably not too hard to implement, and the behavior seems like it would be the least surprising out of all the options.