It was brought to the attention of GitHub in issue#144037 that a customer was seeing that syntax highlighting did not work for a specific language.
Here we see that the __ is overriding the syntax of the rest of the file by making everything after this bold. At first we thought this was a problem in the objective c language however, after more testing in both Atom and GitHub we see that is actually a problem with the markdown syntax highlighting.
The screenshot shows us that if we have an unclosed delimiter for bold or italic it will greedily find the closing delimiter and if it does not, it will scope the tokens that it see as bold or italic thus overriding the syntax highlighting.
Description of the Change
Prior to this change, the grammar file was a CSON file that had all the patterns to match in one list; making it very difficult to parse through and identify where the problem was.
During my investigation, I stumbled on two things
VSCode grammar, which also uses TextMate syntax does not have this issue
Another atom package for markdown, language-markdown, also did not have this issue.
They both used the grouped collection of patterns that I am introducing in the change. So my first change was to do just a cosmetic refactor. I separated the patterns into groups influenced by the groups that were used in the two packages/extensions that I mentioned above.
After doing that and making sure the tests still pass (unchanged) I then made changes to fix the behavior that was reported. In order to fix that behavior I introduced/replaced the patterns for bold and italic. These were mostly patterned off the language-markdown package since it was already supported in Atom 🎉
Testing
I was able to test both in Atom and GitHub
Atom
With the new grammar, I overrode the definition of the package and reloaded the editor.
GitHub
GitHub uses linguist to highlight the files that you see online. Linguist uses the language-gfm package for markdown parsing and syntax highlighting. There is an an app (Lighshow) that one can use to paste in the grammar for testing and it will highlight the block of text that you pass in.
We see the comparison of using the old grammar and new grammar
Markdown and GitHub Flavored Markdown
I wanted to make sure that we still correctly highlight basic markdown. So I made a quick test using Lightshow and used the examples found in Mastering Markdown
Body of the text I used
# This is an
tag
## This is an
tag
###### This is an
tag
*This text will be italic*
_This will also be italic_
**This text will be bold**
__This will also be bold__
***This will be bold italic***
___This is also bold italic___
**™ ™ ³**
__™ ™ ³__
is*italic*
not_italic_
**bold
and bold**
_You **can** combine them_
* Item 1
* Item 2
* Item 2a
* Item 2b
1. Item 1
1. Item 2
1. Item 3
1. Item 3a
1. Item 3b
![GitHub Logo](/images/logo.png)
Format: ![Alt Text](url)
http://github.com - automatic!
[GitHub](http://github.com)
As Kanye West said:
> We're living the future so
> the present is our past.
I think you should use an
`` element here instead.
```javascript
function fancyAlert(arg) {
if(arg) {
$.facebox({div:'#foo'})
}
}
```
function fancyAlert(arg) {
if(arg) {
$.facebox({div:'#foo'})
}
}
def foo():
if not bar:
return True
- [x] @mentions, #refs, [links](), **formatting**, and tags supported
- [x] list syntax required (any unordered or ordered list supported)
- [x] this is a complete item
- [ ] this is an incomplete item
First Header | Second Header
------------ | -------------
Content from cell 1 | Content from cell 2
Content in the first column | Content in the second column
16c999e8c71134401a78d4d46435517b2271d6ac
mojombo@16c999e8c71134401a78d4d46435517b2271d6ac
mojombo/github-flavored-markdown@16c999e8c71134401a78d4d46435517b2271d6ac
#1
mojombo#1
mojombo/github-flavored-markdown#1
Mention @nickborromeo
http://www.github.com/
~~this~~that
This a screen recording of the comparison, on the left is the new grammar and on the right is GitHub in production right now.
Alternate Designs
An alternative is to keep the CSON file and just stick in the new rules/patterns to fix the behavior. This would be the quicker option however, will make the file even less readable in my opinion.
Benefits
The grammar is now easier to follow. In editors that support folding, it is quiet easy now to navigate through the blocks since they are grouped and have an indicator of what the patterns in the group are for.
Possible Drawbacks
The tests that are not very exhaustive. I am not 100 percent sure that we have feature parity, at least for the ones that count. However, I think this out weighs the current behavior we have and we can always iterate on as issue come up (if any).
I would like to thank @maxbrunsfeld who paired with me while investigating this and help me with questions I had to better understand the processes and services used. 🙇
Context
It was brought to the attention of GitHub in issue#144037 that a customer was seeing that syntax highlighting did not work for a specific language.
Here we see that the
__
is overriding the syntax of the rest of the file by making everything after this bold. At first we thought this was a problem in theobjective c
language however, after more testing in both Atom and GitHub we see that is actually a problem with the markdown syntax highlighting.The screenshot shows us that if we have an unclosed delimiter for bold or italic it will greedily find the closing delimiter and if it does not, it will scope the tokens that it see as bold or italic thus overriding the syntax highlighting.
Description of the Change
Prior to this change, the grammar file was a
CSON
file that had all the patterns to match in one list; making it very difficult to parse through and identify where the problem was.During my investigation, I stumbled on two things
They both used the grouped collection of patterns that I am introducing in the change. So my first change was to do just a cosmetic refactor. I separated the patterns into groups influenced by the groups that were used in the two packages/extensions that I mentioned above.
After doing that and making sure the tests still pass (unchanged) I then made changes to fix the behavior that was reported. In order to fix that behavior I introduced/replaced the patterns for bold and italic. These were mostly patterned off the
language-markdown
package since it was already supported in Atom 🎉Testing
I was able to test both in Atom and GitHub
Atom
With the new grammar, I overrode the definition of the package and reloaded the editor.
GitHub
GitHub uses linguist to highlight the files that you see online. Linguist uses the
language-gfm
package for markdown parsing and syntax highlighting. There is an an app (Lighshow) that one can use to paste in the grammar for testing and it will highlight the block of text that you pass in.We see the comparison of using the old grammar and new grammar
Markdown and GitHub Flavored Markdown
I wanted to make sure that we still correctly highlight basic markdown. So I made a quick test using Lightshow and used the examples found in Mastering Markdown
Body of the text I used
# This is an
tag ## This is an
tag ###### This is an
tag *This text will be italic* _This will also be italic_ **This text will be bold** __This will also be bold__ ***This will be bold italic*** ___This is also bold italic___ **™ ™ ³** __™ ™ ³__ is*italic* not_italic_ **bold and bold** _You **can** combine them_ * Item 1 * Item 2 * Item 2a * Item 2b 1. Item 1 1. Item 2 1. Item 3 1. Item 3a 1. Item 3b ![GitHub Logo](/images/logo.png) Format: ![Alt Text](url) http://github.com - automatic! [GitHub](http://github.com) As Kanye West said: > We're living the future so > the present is our past. I think you should use an `` element here instead.
```javascript
function fancyAlert(arg) {
if(arg) {
$.facebox({div:'#foo'})
}
}
```
function fancyAlert(arg) {
if(arg) {
$.facebox({div:'#foo'})
}
}
def foo():
if not bar:
return True
- [x] @mentions, #refs, [links](), **formatting**, and
tagssupported - [x] list syntax required (any unordered or ordered list supported) - [x] this is a complete item - [ ] this is an incomplete item First Header | Second Header ------------ | ------------- Content from cell 1 | Content from cell 2 Content in the first column | Content in the second column 16c999e8c71134401a78d4d46435517b2271d6ac mojombo@16c999e8c71134401a78d4d46435517b2271d6ac mojombo/github-flavored-markdown@16c999e8c71134401a78d4d46435517b2271d6ac #1 mojombo#1 mojombo/github-flavored-markdown#1 Mention @nickborromeo http://www.github.com/ ~~this~~thatThis a screen recording of the comparison, on the left is the new grammar and on the right is GitHub in production right now.
Alternate Designs
An alternative is to keep the
CSON
file and just stick in the new rules/patterns to fix the behavior. This would be the quicker option however, will make the file even less readable in my opinion.Benefits
The grammar is now easier to follow. In editors that support folding, it is quiet easy now to navigate through the blocks since they are grouped and have an indicator of what the patterns in the group are for.
Possible Drawbacks
The tests that are not very exhaustive. I am not 100 percent sure that we have feature parity, at least for the ones that count. However, I think this out weighs the current behavior we have and we can always iterate on as issue come up (if any).
Applicable Issues
Acknowledgments
I would like to thank @maxbrunsfeld who paired with me while investigating this and help me with questions I had to better understand the processes and services used. 🙇