atom / language-gfm

GitHub Flavored Markdown in Atom
MIT License
101 stars 107 forks source link

Refactor grammar in order to improve readability and address a bug in syntax highlighting of Atom and GitHub #240

Closed nickborromeo closed 4 years ago

nickborromeo commented 4 years ago

Context

It was brought to the attention of GitHub in issue#144037 that a customer was seeing that syntax highlighting did not work for a specific language.

image

Here we see that the __ is overriding the syntax of the rest of the file by making everything after this bold. At first we thought this was a problem in the objective c language however, after more testing in both Atom and GitHub we see that is actually a problem with the markdown syntax highlighting.

image

The screenshot shows us that if we have an unclosed delimiter for bold or italic it will greedily find the closing delimiter and if it does not, it will scope the tokens that it see as bold or italic thus overriding the syntax highlighting.

Description of the Change

Prior to this change, the grammar file was a CSON file that had all the patterns to match in one list; making it very difficult to parse through and identify where the problem was.

During my investigation, I stumbled on two things

They both used the grouped collection of patterns that I am introducing in the change. So my first change was to do just a cosmetic refactor. I separated the patterns into groups influenced by the groups that were used in the two packages/extensions that I mentioned above.

After doing that and making sure the tests still pass (unchanged) I then made changes to fix the behavior that was reported. In order to fix that behavior I introduced/replaced the patterns for bold and italic. These were mostly patterned off the language-markdown package since it was already supported in Atom 🎉

Testing

I was able to test both in Atom and GitHub

Atom

With the new grammar, I overrode the definition of the package and reloaded the editor.

2020-06-02_17-28-52

GitHub

GitHub uses linguist to highlight the files that you see online. Linguist uses the language-gfm package for markdown parsing and syntax highlighting. There is an an app (Lighshow) that one can use to paste in the grammar for testing and it will highlight the block of text that you pass in.

We see the comparison of using the old grammar and new grammar

2020-06-02_17-34-17 2020-06-02_17-33-47

Markdown and GitHub Flavored Markdown

I wanted to make sure that we still correctly highlight basic markdown. So I made a quick test using Lightshow and used the examples found in Mastering Markdown

Body of the text I used
# This is an

tag ## This is an

tag ###### This is an

tag *This text will be italic* _This will also be italic_ **This text will be bold** __This will also be bold__ ***This will be bold italic*** ___This is also bold italic___ **™ ™ ³** __™ ™ ³__ is*italic* not_italic_ **bold and bold** _You **can** combine them_ * Item 1 * Item 2 * Item 2a * Item 2b 1. Item 1 1. Item 2 1. Item 3 1. Item 3a 1. Item 3b ![GitHub Logo](/images/logo.png) Format: ![Alt Text](url) http://github.com - automatic! [GitHub](http://github.com) As Kanye West said: > We're living the future so > the present is our past. I think you should use an `` element here instead. ```javascript function fancyAlert(arg) { if(arg) { $.facebox({div:'#foo'}) } } ``` function fancyAlert(arg) { if(arg) { $.facebox({div:'#foo'}) } } def foo(): if not bar: return True - [x] @mentions, #refs, [links](), **formatting**, and tags supported - [x] list syntax required (any unordered or ordered list supported) - [x] this is a complete item - [ ] this is an incomplete item First Header | Second Header ------------ | ------------- Content from cell 1 | Content from cell 2 Content in the first column | Content in the second column 16c999e8c71134401a78d4d46435517b2271d6ac mojombo@16c999e8c71134401a78d4d46435517b2271d6ac mojombo/github-flavored-markdown@16c999e8c71134401a78d4d46435517b2271d6ac #1 mojombo#1 mojombo/github-flavored-markdown#1 Mention @nickborromeo http://www.github.com/ ~~this~~that

This a screen recording of the comparison, on the left is the new grammar and on the right is GitHub in production right now.

2020-06-02_17-47-36 (1)

Alternate Designs

An alternative is to keep the CSON file and just stick in the new rules/patterns to fix the behavior. This would be the quicker option however, will make the file even less readable in my opinion.

Benefits

The grammar is now easier to follow. In editors that support folding, it is quiet easy now to navigate through the blocks since they are grouped and have an indicator of what the patterns in the group are for.

Possible Drawbacks

The tests that are not very exhaustive. I am not 100 percent sure that we have feature parity, at least for the ones that count. However, I think this out weighs the current behavior we have and we can always iterate on as issue come up (if any).

Applicable Issues

Acknowledgments

I would like to thank @maxbrunsfeld who paired with me while investigating this and help me with questions I had to better understand the processes and services used. 🙇

maxbrunsfeld commented 4 years ago

It looks like you've tested this very thoroughly, and this fixes a really bad bug. I think at this point we should 🚢 .