github-linguist / linguist

Language Savant. If your repository's language is being reported incorrectly, send us a pull request!
MIT License
12.31k stars 4.26k forks source link

What do we need to do to fix Racket highlighting? #1717

Closed greghendershott closed 9 years ago

greghendershott commented 9 years ago

Few days ago, Racket code started getting highlighted wrong. Someone filed https://github.com/tmm1/pygments.rb/issues/141. As my comment there explains, I'm starting to figure out that the problem is that you're suddenly not using Pygments anymore. Correct?

Can you please help me understand what needs to happen, now, to fix this? Does someone (e.g. me) need to write a Racket language lexer for TextMate? (And maybe fix the Scheme one, too?)

Thanks in advance for pointing me in the right direction.

greghendershott commented 9 years ago

I found /grammars.yml and for example the Scheme part:

https://github.com/textmate/scheme.tmbundle:
- source.scheme

Which leads to "TextMate support for Scheme".

Which appears to be a half dozen plist files and one Ruby "shim" file.

Is that the idea?

qerub commented 9 years ago

AFAIK, there's support for TextMate, Sublime Text and Atom lexers in Linguist ATM.

greghendershott commented 9 years ago

Ah, thanks. So another option would be to create a grammar for https://github.com/atom/highlights like one of the examples listed here.

Quick glance, cson is more concise than plist.

qerub commented 9 years ago

It looks like one should add the Git repo for the Atom grammar directly to grammars.yml. (Just search for "atom" in that file.)

pchaigno commented 9 years ago

@greghendershott Couldn't https://github.com/follesoe/sublime-racket or https://github.com/christophevg/racket-tmbundle be used? (I guess we would have to clarify the licenses before with the authors).

greghendershott commented 9 years ago

@pchaigno Thank you very much for the leads on those.

I can't really evaluate without installing TextMate and SublimeText, which I can't do right now.

Quick glance, those don't seem nearly as comprehensive support as what David Corbett had done in Pygments: Line 458 through 1349.

Translating all that to Atom's model seems... likely to be error-prone and not a ton of fun.

Or maybe I'm being too pessimistic. Thank you again for pointing these out.

greghendershott commented 9 years ago

p.s. Probably using one of those would at least be an improvement over the status quo.

vmg commented 9 years ago

I've tested the grammar at https://github.com/christophevg/racket-tmbundle and it seems to me like it does a much better job at highlighting Racket. I'll deploy it tomorrow to production and see how that goes.

You're of course free to fork the Racket bundle and improve it as you see fit. I'm afraid nobody at GitHub works with Racket so we can't judge what proper highlighting looks like. But we'll of course pull your changes thanks to the magic of O P E N S O U R C E.

Thanks and sorry for the fuss!

dscorbett commented 9 years ago

This stress test file highlights what the TextMate bundle needs work on.

infininight commented 9 years ago

Was doing a little work on the racket grammar but I'm hitting some dead ends due to lack of a language spec, does anyone know if such thing exists?

dscorbett commented 9 years ago

http://docs.racket-lang.org/reference/reader.html

soegaard commented 9 years ago

@vmg 1) Is the TextMate lexer format the only Github supported lexer format?

vmg commented 9 years ago

@soegaard: That is correct. TM-compatible grammars are the only supported format moving forward (note that this also includes Sublime Text and Atom grammars).

soegaard commented 9 years ago

Is there a list of styles supported by GitHub?

soegaard commented 9 years ago

I am a bit worried about this comment:

Bear in mind, however, that because of the way the TextMate parser surveys your document, 
all regular expressions used in grammar match rules must apply to a single line at a time. A single
expression cannot embrace multiple lines.

from this very nice explanation of how to write lexers for TextMate: http://www.apeth.com/nonblog/stories/textmatebundle.html

vmg commented 9 years ago

Is there a list of styles supported by GitHub?

What do you mean by styles?

I am a bit worried about this comment:

The "one line rule" is not really a limitation in practice, as parsing context is maintained between lines. It's trivial to write parsers that have things like recursive S-expressions, block comments, multi-line strings, and so on. It just requires some time getting used to.

soegaard commented 9 years ago

This document lists a bunch of styles in use by TextMate bundles: http://www.apeth.com/nonblog/stories/textmatebundle.html

The list is in the paragraph titled "Standard Themes".:

comment
constant
constant.character.escape
constant.language
constant.numeric

declaration.section entity.name.section declaration.tag deco.folding entity.name.function entity.name.tag entity.name.type ...

greghendershott commented 9 years ago

But we'll of course pull your changes thanks to the magic of O P E N S O U R C E.

Pygments contains a huge amount of open source magic. I still don't understand why you're abandoning that. And, expecting people to re-invent the wheel. And, to do so as a fire-drill with no advance notice.

In open source, if you believe some disruptive change is worthwhile, it's often a good idea to explain why -- to inspire the people who will need to redo all the things. So far I'm not feeling that.

DavidJFelix commented 9 years ago

@greghendershott my inkling is NotRuby mixed with NIH

danluu commented 9 years ago

Is there any possibility of using pygments for languages that aren't well supported by the new highlighter and then migrating over language-by-language as each language gets better support?

This issue happens to be discussing Racket, but this causes problems for a lot of languages.

aroben commented 9 years ago

@greghendershott I'm sorry that Racket highlighting regressed with this change. We did our best to make sure that highlighting was as good or better than before in as many cases as we could, but we knew that some problems just wouldn't be found until we shipped the new highlighter. We're now working hard to fix the problems that have been reported.

Pygments contains a huge amount of open source magic. I still don't understand why you're abandoning that.

The primary motivation was speed. The new highlighter is a lot faster than Pygments, which you can see in this graph that shows how long it takes to highlight files on github.com:

(We shipped the new highlighter on 11/5, as the graph makes pretty clear.)

By using TextMate grammars we also get some nice features like highlighting SQL inside Ruby heredocs. But the main motivation was improving performance.

And, expecting people to re-invent the wheel.

There are lots of TextMate/Sublime Text/Atom language grammars out there that we can take advantage of. We have already added support for some languages that Pygments doesn't support. I hope there isn't much wheel-reinventing required.

And, to do so as a fire-drill with no advance notice.

We don't typically announce changes to github.com in advance. But I do wish we had had more documentation in place on the day we shipped. We've since been working to make it more obvious what grammars we're using (#1710) and improve our documentation around the new highlighter (#1722 #1730) so that problems like the ones you noticed are easier to diagnose and fix. We're not done with this yet; I'd like it to be a lot easier for people to try out new or modified grammars.

As for the original question posed in this issue: We're working on getting the new Racket grammar mentioned above deployed. If there are still problems after that the best thing to do is open issues on the grammar's repo and get them fixed there (or in a fork). We're more than happy to pull in improved grammars!

I hope that explains the current situation better.

aroben commented 9 years ago

The new Racket grammar has been deployed. I'm going to close this issue because it seems like the original problem has been fixed. Please feel free to open new issues if you find more problems (or open them on the Racket grammar's repo as described above, if that's appropriate).

soegaard commented 9 years ago

Note that Scheme files still highlights square brackets as an error.

https://github.com/akeep/scheme-to-c/blob/master/c.ss

2014-11-19 19:25 GMT+01:00 Adam Roben notifications@github.com:

The new Racket grammar has been deployed. I'm going to close this issue because it seems like the original problem has been fixed. Please feel free to open new issues if you find more problems (or open them on the Racket grammar's repo as described above, if that's appropriate).

— Reply to this email directly or view it on GitHub https://github.com/github/linguist/issues/1717#issuecomment-63687877.

Jens Axel Søgaard

aroben commented 9 years ago

@soegaard Could you either file an issue on https://github.com/textmate/scheme.tmbundle, or point us to a different Scheme grammar that works better? Thanks!

soegaard commented 9 years ago

Done!

2014-11-20 17:54 GMT+01:00 Adam Roben notifications@github.com:

@soegaard https://github.com/soegaard Could you either file an issue on https://github.com/textmate/scheme.tmbundle, or point us to a different Scheme grammar that works better? Thanks!

— Reply to this email directly or view it on GitHub https://github.com/github/linguist/issues/1717#issuecomment-63840291.

Jens Axel Søgaard