Closed mojavelinux closed 6 years ago
Technically this still isn't right because the following should not be parsed:
*foo*_
The rules for a constrained formatting mark are as follows:
Underscore is a word character. It's not allowed for strong, but obviously it's allowed for emphasis. Therefore, we may need two different matchers.
There cannot be a word character (\p{Word}) immediately outside the formatting marks
Ok, I thought it was a limitation of the current implementation that should be resolved with the new parser.
Underscore is a word character. It's not allowed for strong, but obviously it's allowed for emphasis. Therefore, we may need two different matchers.
Indeed...
I thought it was a limitation of the current implementation that should be resolved with the new parser.
I'd say this is a very logical definition of what constrained is. No space inside, no word character outside.
I'd say this is a very logical definition of what constrained is. No space inside, no word character outside.
The definition is indeed very logical but I find it odd that a number is not allowed. In fact, we are using the definition of the Ruby Regexp engine to define what is a word character:
/\p{Word}/ - A member of one of the following Unicode general category Letter, Mark, Number, Connector_Punctuation
https://ruby-doc.org/core-2.1.1/Regexp.html#class-Regexp-label-Character+Properties
I don't know, maybe it's the most reasonable definition but I find this rule a bit restrictive on this one use case. Anyway I don't think we should change the behavior, we just need to make sure that we (the writers) share the same definition of a word character :nerd_face:
The definition is indeed very logical but I find it odd that a number is not allowed.
But a number is a word character. Therefore, if the number is immediately adjacent to a formatting mark, that's not a word boundary. In other words:
*formula*1
It's completely logical that the formatting is not applied in this case because the *
is in the middle of the "word". If we violated that rule, it would very likely break tons of AsciiDoc documents.
I find this rule a bit restrictive on this one use case.
I think it would be much harder to explain that a number is a word boundary. Right now, you look at the sequence of characters, see that the *
sandwiched inside the "word" and conclude that you would need unconstrained marks. That's much easier to understand IMO.
It makes sense that punctuation is a word boundary, like in this case:
*fin*.
I was reading 2
as a single character (ie. not part of the "word" mc2
), it makes sense with *formula*1
.
It's completely logical that the formatting is not applied in this case because the * is in the middle of the "word". If we violated that rule, it would very likely break tons of AsciiDoc documents.
Absolutely.
I think it would be much harder to explain that a number is a word boundary. Right now, you look at the sequence of characters, see that the * sandwiched inside the "word" and conclude that you would need unconstrained marks. That's much easier to understand IMO.
I think it's easier to understand with boundary.
Quoted must be bounded by white space or commonly adjoining punctuation characters.
A constrained formatting mark may not be followed by any word character or an underscore.