Inline asterisks are not closed properly

pepelsbey commented 9 years ago

Use case “make first letter of a word bold or italic” works fine both in Markdown preview in Atom and on GitHub, just like in Sublime Text. But there’s a problem in GFM language module preventing proper rendering right in the editor:

*f*oo *b*ar baz
**f**oo **b**ar baz

Atom editor
Atom Markdown preview (proper behaviour)
Sublime Text editor (proper behaviour)

Only first letters of the first two words should be bold/italic, not the whole word and especially not the rest of the text after.

ootz0rz commented 9 years ago

Similar issue with underlines. Ex:

Blah blah blah blah blah blah foo bar fooooo oh hi

Everything in between those two "_"'s is highlighted as being italics, but it's correctly rendered such that they're not.

burodepeper commented 9 years ago

schermafbeelding 2015-10-02 om 10 53 52

Work in progress...

edent commented 8 years ago

I'm seeing the same issue (latest .deb version)

boling

The syntax highlighting really doesn't like it if the *s are inline.

burodepeper commented 8 years ago

@edent: try the language-markdown package and let me know if that solves your problem

edent commented 8 years ago

@burodepeper that fixes the formatting - but for some reason the preview pane won't show up!

burodepeper commented 8 years ago

@edent You mean the Markdown preview? Could you perhaps create an issue with as much relevant details as possible, then I'll have a look tomorrow.

queuedq commented 7 years ago

This phenomenon is due to 17a9412. The commit message is:

Whitespace after opening and before closing an tag is invalid ( e.g. _ text_ or _text _ ).

Before opening and after closing tags the only character accepted is anything but a word or a digit ( 2*text* and d*text* are invalid but not $*text*$ ).

The second sentence is wrong in the context of GFM spec. I have no idea why this commit was accepted.

Using language-markdown instead seems to resolve the problem. However, since language-gfm package is a core package of Atom, it is preferred to fix the problem within this package.

I have tried to fix this issue, however I faced several difficulties:

The original spec is quite complicated to represent in regex.
1. It makes several new definitions: delimiter run, punctuation character, left(right)-flanking delimiter run.
2. Some rules including the 9th rule are difficult to implement.
I am not used to Atom's grammar syntax. I don't know if those rules can be implemented in some way other than just using regex match.

And this is my attempt to fix the problem: (Note that the line numbers might differ from the current version since it is downloaded before some new commits are made. I have checked the new commits are not related to this issue.)

diff --git a/grammars/gfm.cson b/grammars/gfm.cson
index 06759a1..67d95e6 100644
--- a/grammars/gfm.cson
+++ b/grammars/gfm.cson
@@ -16,8 +16,8 @@
     'name': 'constant.character.escape.gfm'
   }
   {
-    'begin': '(?<=^|[^\\w\\d\\*])\\*\\*\\*(?!$|\\*|\\s)'
-    'end': '(?<!^|\\s)\\*\\*\\**\\*(?=$|[^\\w|\\d])'
+    'begin': '(?<!\\*)\\*{3}(?!\\*)(?=\\w)|(?<!\\w|\\*)\\*{3}(?!$|\\s|\\*)'
+    'end': '(?<=\\w)(?<!\\*)\\*{3}(?!\\*)|(?<!^|\\s|\\*)\\*{3}(?!\\w|\\*)'
     'name': 'markup.bold.italic.gfm'
     'patterns': [
       {
@@ -32,8 +32,8 @@
     ]
   }
   {
-    'begin': '(?<=^|[^\\w\\d_])___(?!$|_|\\s)'
-    'end': '(?<!^|\\s)___*_(?=$|[^\\w|\\d])'
+    'begin': '(?<!\\w|_)___(?!$|\\s|_)'
+    'end': '(?<!^|\\s|_)___(?!\\w|_)'
     'name': 'markup.bold.italic.gfm'
     'patterns': [
       {
@@ -48,8 +48,8 @@
     ]
   }
   {
-    'begin': '(?<=^|[^\\w\\d\\*])\\*\\*(?!$|\\*|\\s)'
-    'end': '(?<!^|\\s)\\*\\**\\*(?=$|[^\\w|\\d])'
+    'begin': '(?<!\\*)\\*\\*(?!\\*)(?=\\w)|(?<!\\w|\\*)\\*\\*(?!$|\\s|\\*)'
+    'end': '(?<=\\w)(?<!\\*)\\*\\*(?!\\*)|(?<!^|\\s|\\*)\\*\\*(?!\\w|\\*)'
     'name': 'markup.bold.gfm'
     'patterns': [
       {
@@ -64,8 +64,8 @@
     ]
   }
   {
-    'begin': '(?<=^|[^\\w\\d_])__(?!$|_|\\s)'
-    'end': '(?<!^|\\s)__*_(?=$|[^\\w|\\d])'
+    'begin': '(?<!\\w|_)__(?!$|\\s|_)'
+    'end': '(?<!^|\\s|_)__(?!\\w|_)'
     'name': 'markup.bold.gfm'
     'patterns': [
       {
@@ -80,8 +80,8 @@
     ]
   }
   {
-    'begin': '(?<=^|[^\\w\\d\\*])\\*(?!$|\\*|\\s)'
-    'end': '(?<!^|\\s)\\**\\*(?=$|[^\\w|\\d])'
+    'begin': '(?<!\\*)\\*(?!\\*)(?=\\w)|(?<!\\w|\\*)\\*(?!$|\\s|\\*)'
+    'end': '(?<=\\w)(?<!\\*)\\*(?!\\*)|(?<!^|\\s|\\*)\\*(?!\\w|\\*)'
     'name': 'markup.italic.gfm'
     'patterns': [
       {
@@ -96,8 +96,8 @@
     ]
   }
   {
-    'begin': '(?<=^|[^\\w\\d_\\{\\}])_(?!$|_|\\s)'
-    'end': '(?<!^|\\s)_*_(?=$|[^\\w|\\d])'
+    'begin': '(?<!\\w|_)_(?!$|\\s|_)'
+    'end': '(?<!^|\\s|_)_(?!\\w|_)'
     'name': 'markup.italic.gfm'
     'patterns': [
       {

Though this change can fix the issue, it is never a good solution. I ignored the characters that are none of unicode whitespaces, punctuation characters, or regex word characters. I compacted the rules into poorly readable regex expression. Not all of the rules are applied to them, and even I reinterpreted some rules to bring them into regex expression.

I hope this issue to be fixed soon. It gives much inconvenience for the users.

EDIT: FYI, you can test your markdown syntax in the commonmark.js dingus.

winstliu commented 7 years ago

I have no idea why this commit was accepted.

Probably because the spec didn't exist in 2014 :wink:.

However, I do agree that this needs to be fixed, and that language-gfm is not in a very good state at the moment. I cannot give you an ETA when I personally will be able to investigate this issue given all the other language issues that are open.

Neonit commented 6 years ago

So this is still not fixed after 2½ years. Of course a fix ignoring Unicode characters is not the best, but it will certainly fix this ~~annoyance~~ bug making the highlighter kind of useless for 95% of all use cases. As much as I appreciate developer teams, who strive for complete fixes – if such a basic issue stays open for such a long time more damage (reputation, blocking user productivity) already occurred than a quick and incomplete fix could ever have caused. So please apply the incomplete fix until you have figured out a way to completely fix this.

miller-time commented 5 years ago

I'm encountering this bug, and I was curious what it might take to fix it.

my example:

* *note: this emphasized bullet has one **bold** word in it*

(only *note: this emphasized bullet has one **bold* is colored with italicized formatting, **bold** is not bold)

I'm pretty sure the problem is that regex cannot be used to parse nested structures (asterisk regions within asterisk regions).

I'm not very familiar with Atom development, but after reading the language grammars docs, it sounds like the existing language-gfm grammar is a TextMate ("legacy") implementation. Has upgrading to tree-sitter been considered?

atom / language-gfm

Inline asterisks are not closed properly #117