Open pepelsbey opened 9 years ago
Similar issue with underlines. Ex:
Blah blah blah blah blah blah foo bar fooooo oh hi
Everything in between those two "_"'s is highlighted as being italics, but it's correctly rendered such that they're not.
Work in progress...
I'm seeing the same issue (latest .deb version)
The syntax highlighting really doesn't like it if the *s are inline.
@edent: try the language-markdown
package and let me know if that solves your problem
@burodepeper that fixes the formatting - but for some reason the preview pane won't show up!
@edent You mean the Markdown preview? Could you perhaps create an issue with as much relevant details as possible, then I'll have a look tomorrow.
This phenomenon is due to 17a9412. The commit message is:
Whitespace after opening and before closing an tag is invalid ( e.g. _ text_ or _text _ ).
Before opening and after closing tags the only character accepted is anything but a word or a digit ( 2*text* and d*text* are invalid but not $*text*$ ).
The second sentence is wrong in the context of GFM spec. I have no idea why this commit was accepted.
Using language-markdown
instead seems to resolve the problem. However, since language-gfm
package is a core package of Atom, it is preferred to fix the problem within this package.
I have tried to fix this issue, however I faced several difficulties:
delimiter run
, punctuation character
, left(right)-flanking delimiter run
.And this is my attempt to fix the problem: (Note that the line numbers might differ from the current version since it is downloaded before some new commits are made. I have checked the new commits are not related to this issue.)
diff --git a/grammars/gfm.cson b/grammars/gfm.cson
index 06759a1..67d95e6 100644
--- a/grammars/gfm.cson
+++ b/grammars/gfm.cson
@@ -16,8 +16,8 @@
'name': 'constant.character.escape.gfm'
}
{
- 'begin': '(?<=^|[^\\w\\d\\*])\\*\\*\\*(?!$|\\*|\\s)'
- 'end': '(?<!^|\\s)\\*\\*\\**\\*(?=$|[^\\w|\\d])'
+ 'begin': '(?<!\\*)\\*{3}(?!\\*)(?=\\w)|(?<!\\w|\\*)\\*{3}(?!$|\\s|\\*)'
+ 'end': '(?<=\\w)(?<!\\*)\\*{3}(?!\\*)|(?<!^|\\s|\\*)\\*{3}(?!\\w|\\*)'
'name': 'markup.bold.italic.gfm'
'patterns': [
{
@@ -32,8 +32,8 @@
]
}
{
- 'begin': '(?<=^|[^\\w\\d_])___(?!$|_|\\s)'
- 'end': '(?<!^|\\s)___*_(?=$|[^\\w|\\d])'
+ 'begin': '(?<!\\w|_)___(?!$|\\s|_)'
+ 'end': '(?<!^|\\s|_)___(?!\\w|_)'
'name': 'markup.bold.italic.gfm'
'patterns': [
{
@@ -48,8 +48,8 @@
]
}
{
- 'begin': '(?<=^|[^\\w\\d\\*])\\*\\*(?!$|\\*|\\s)'
- 'end': '(?<!^|\\s)\\*\\**\\*(?=$|[^\\w|\\d])'
+ 'begin': '(?<!\\*)\\*\\*(?!\\*)(?=\\w)|(?<!\\w|\\*)\\*\\*(?!$|\\s|\\*)'
+ 'end': '(?<=\\w)(?<!\\*)\\*\\*(?!\\*)|(?<!^|\\s|\\*)\\*\\*(?!\\w|\\*)'
'name': 'markup.bold.gfm'
'patterns': [
{
@@ -64,8 +64,8 @@
]
}
{
- 'begin': '(?<=^|[^\\w\\d_])__(?!$|_|\\s)'
- 'end': '(?<!^|\\s)__*_(?=$|[^\\w|\\d])'
+ 'begin': '(?<!\\w|_)__(?!$|\\s|_)'
+ 'end': '(?<!^|\\s|_)__(?!\\w|_)'
'name': 'markup.bold.gfm'
'patterns': [
{
@@ -80,8 +80,8 @@
]
}
{
- 'begin': '(?<=^|[^\\w\\d\\*])\\*(?!$|\\*|\\s)'
- 'end': '(?<!^|\\s)\\**\\*(?=$|[^\\w|\\d])'
+ 'begin': '(?<!\\*)\\*(?!\\*)(?=\\w)|(?<!\\w|\\*)\\*(?!$|\\s|\\*)'
+ 'end': '(?<=\\w)(?<!\\*)\\*(?!\\*)|(?<!^|\\s|\\*)\\*(?!\\w|\\*)'
'name': 'markup.italic.gfm'
'patterns': [
{
@@ -96,8 +96,8 @@
]
}
{
- 'begin': '(?<=^|[^\\w\\d_\\{\\}])_(?!$|_|\\s)'
- 'end': '(?<!^|\\s)_*_(?=$|[^\\w|\\d])'
+ 'begin': '(?<!\\w|_)_(?!$|\\s|_)'
+ 'end': '(?<!^|\\s|_)_(?!\\w|_)'
'name': 'markup.italic.gfm'
'patterns': [
{
Though this change can fix the issue, it is never a good solution. I ignored the characters that are none of unicode whitespaces, punctuation characters, or regex word characters. I compacted the rules into poorly readable regex expression. Not all of the rules are applied to them, and even I reinterpreted some rules to bring them into regex expression.
I hope this issue to be fixed soon. It gives much inconvenience for the users.
EDIT: FYI, you can test your markdown syntax in the commonmark.js dingus.
I have no idea why this commit was accepted.
Probably because the spec didn't exist in 2014 :wink:.
However, I do agree that this needs to be fixed, and that language-gfm is not in a very good state at the moment. I cannot give you an ETA when I personally will be able to investigate this issue given all the other language issues that are open.
So this is still not fixed after 2½ years. Of course a fix ignoring Unicode characters is not the best, but it will certainly fix this annoyance bug making the highlighter kind of useless for 95% of all use cases. As much as I appreciate developer teams, who strive for complete fixes – if such a basic issue stays open for such a long time more damage (reputation, blocking user productivity) already occurred than a quick and incomplete fix could ever have caused. So please apply the incomplete fix until you have figured out a way to completely fix this.
I'm encountering this bug, and I was curious what it might take to fix it.
my example:
* *note: this emphasized bullet has one **bold** word in it*
(only *note: this emphasized bullet has one **bold*
is colored with italicized formatting, **bold**
is not bold)
I'm pretty sure the problem is that regex cannot be used to parse nested structures (asterisk regions within asterisk regions).
I'm not very familiar with Atom development, but after reading the language grammars docs, it sounds like the existing language-gfm grammar is a TextMate ("legacy") implementation. Has upgrading to tree-sitter been considered?
Use case “make first letter of a word bold or italic” works fine both in Markdown preview in Atom and on GitHub, just like in Sublime Text. But there’s a problem in GFM language module preventing proper rendering right in the editor:
Only first letters of the first two words should be bold/italic, not the whole word and especially not the rest of the text after.