github-linguist / linguist

Language Savant. If your repository's language is being reported incorrectly, send us a pull request!
MIT License
12.31k stars 4.26k forks source link

JSX Syntax highlighting broken? #3044

Closed cozos closed 5 years ago

cozos commented 8 years ago

Hey,

I was redirected here by support@github.com - it seems that JSX Highlighting is broken on Github.

For example: https://gist.github.com/cozos/d6636a93cba29c77e3221f2ca30212b0

import React from 'react';

var ExampleApplication = React.createClass({
  render: function() {
    var elapsed = Math.round(this.props.elapsed  / 100);
    var seconds = elapsed / 10 + (elapsed % 10 ? '' : '.0' );
    var message =
      'React has been successfully running for ' + seconds + ' seconds.';

    return <p>{message}</p>;
  }
});

var start = new Date().getTime();

setInterval(function() {
  ReactDOM.render(
    <ExampleApplication elapsed={new Date().getTime() - start} />,
    document.getElementById('container')
  );
}, 50);

If it helps, I'm on the latest build of Google Chrome.

pchaigno commented 8 years ago

Linguist only selects the grammar to use for highlighting. For JSX, GitHub uses gandm/language-babel. You should open an issue there if there isn't already one. Sorry for the two redirects :S

cozos commented 8 years ago

👌

arfon commented 8 years ago

Closing as this is an upstream issue.

gandm commented 8 years ago

@arfon @pchaigno Author of language-babel here. I've just checked my grammar using the version on linguist - 2.25.1 - inside Atom and it appears to work so I don't know what is going on.

e.g. image

but on github

/***
  XMLHttpRequest example
 */

import 'js/web.jsx';
import 'console.jsx';

class _Main {
    static function main(args : string[]) : void {
        var xhr = new XMLHttpRequest();
        var path = dom.document.location.pathname;
        xhr.open("GET", path.replace(/\/[^\/]*$/, "") + "/hello.txt");
        xhr.onreadystatechange = function (e) {
            if (xhr.readyState == xhr.DONE) {
                _Main.update(xhr.responseText);
            }
        };
        xhr.onerror = function (e) {
            console.error("XHR error");
        };
        xhr.send();
    }

    static function update(text : string) : void {
        var output = dom.id("output");
        var textNode = dom.document.createTextNode(text);
        output.appendChild(textNode);
    }
}

// vim: set expandtab:
Alhadis commented 8 years ago

@gandm Not 100% sure if this is the issue, but your end pattern for #literal-module-import has an unbalanced closing bracket at the end. That's the pattern that's being matched at the first line.

Also, I couldn't help noticing you've escaped semicolons in your expressions (\\;). Any particular reason why?

gandm commented 8 years ago

@Alhadis The end pattern looks here fine to me. I don't see any unbalanced parens.

No reason for escaping the semi colons.

Alhadis commented 8 years ago

Look a bit closer:

Figure 1
gandm commented 8 years ago

I just ran it through RegexBuddy and it's OK.

gandm commented 8 years ago

\s(?:(?:(\bfrom\b)?+\s++(('|\")([^\"'])(\k<-3>)))|(?=\;|^\s*\b(if|switch|try|var|let|const|static|function|return|class|do|for

\s*(?:(?:(\bfrom\b)?+\s++(('|\")([^\"']*)(\k<-3>)))|(?=\;|^\s*\b(if|switch|try|var|let|const|static|function|return|class|do|for|while|debugger|export|import|yield|type|declare|interface)\b|\)|}))

Options: Case sensitive; Exact spacing; Dot doesn’t match line breaks

Created with RegexBuddy

pchaigno commented 8 years ago

@Alhadis Maybe you should try submitting a pull request (with a test on Lightshow) to @gandm's repository...?

gandm commented 8 years ago

What exactly are we trying to prove? Of course I would expect that end statement to match import "blah"

Alhadis commented 8 years ago

I said I wasn't sure if it was the cause of the problem. But I believe I've found an issue @gandm can investigate: there seems to be an issue somewhere in the end pattern:

Since the first import statement failed to find a matching end, it's essentially swallowed up the rest of the document.

gandm commented 8 years ago

It works in atom, so I suggest you remove it from linguist and find another package.


From: John Gardnermailto:notifications@github.com Sent: ‎08/‎06/‎2016 10:16 To: github/linguistmailto:linguist@noreply.github.com Cc: Grahammailto:graham.f.clark54@gmail.com; Mentionmailto:mention@noreply.github.com Subject: Re: [github/linguist] JSX Syntax highlighting broken? (#3044)

I said I wasn't sure if it was the cause of the problem. But I believe I've found an issue @gandm can investigate: there seems to be an issue somewhere in the end pattern:

Since the first import statement failed to find a matching end, it's essentially swallowed up the rest of the document.


You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/github/linguist/issues/3044#issuecomment-224534343

gandm commented 8 years ago

I've just run code that works in oniguruma but does nothing in lightshow. I use this type of back referencing a lot in the grammar. Maybe this is the issue?

Test Grammar

{
  "name": "Test capture group",
  "scopeName": "source.js.jsx",
  "patterns": [
    { 
       "name": "invalid.illegal.string.js",
       "match": "(('|\")([^\"']*)(\\k<-3>))" 
    }
  ]
}

against code

"hello world"

should highlight in reverse red.

Seems like backreferencing isn't working in whatever Regex engine is used by Lightshow but does in Oniguruma engine as used by Atom.

Alhadis commented 8 years ago

That was my theory, but it still wasn't matching when I replaced it with an ordinary backreference.

gandm commented 8 years ago

Yeah I think back referencing is broken in whatever engine is used in lightshow and I guess by extension in linguist. What engine is it?

Alhadis commented 8 years ago

I don't know, nobody outside GitHub does. The damn thing is closed source.

gandm commented 8 years ago

Well I could fix this single issue by changing the end for imports regex but the grammar uses this in other places where it isn't possible to change it. Therefore it may be prudent to look at other grammars that parse jsx but don't use back refererencing and use one of these inside linguist.

arfon commented 8 years ago

@Alhadis is this the same issue as described in https://github.com/github/linguist/pull/2703 ?

Alhadis commented 8 years ago

Nope, it's a separate issue. Basically, Lightshow seems to be powered with a different regex engine that only supports a subset of Oniguruma's extensions.

I say "subset" because most TextMate grammars already use Oniguruma extensions in the form of (?x:...) or \\G (well, which were taken from Perl, at least).

arfon commented 8 years ago

Nope, it's a separate issue. Basically, Lightshow seems to be powered with a different regex engine that only supports a subset of Oniguruma's extensions.

OK thanks.

gandm commented 8 years ago

I just noticed that @cozos, who raised the issue, appears to be using jsx to mean the statically typed OO language JSX and not the JSX extension to JavaScript as used by facebook's React. Maybe linguist has better support for this language using another suffix that he could use?

It makes no difference to the underlying issue, as React JSX would also get this problem due to the incompatibility between the linguist regex engine and Atom's Oniguruma.

cozos commented 8 years ago

@gandm I apologize, I actually meant the JSX extension to JavaScript used by Facebook's React - I was just looking for some examples of JSX on Github and seeing that the syntax highlighting issue was also happening on the statically typed JSX I carelessly assumed it was the React JSX.

I will edit the example to this:

https://gist.github.com/cozos/d6636a93cba29c77e3221f2ca30212b0

import React from 'react';

var ExampleApplication = React.createClass({
  render: function() {
    var elapsed = Math.round(this.props.elapsed  / 100);
    var seconds = elapsed / 10 + (elapsed % 10 ? '' : '.0' );
    var message =
      'React has been successfully running for ' + seconds + ' seconds.';

    return <p>{message}</p>;
  }
});

var start = new Date().getTime();

setInterval(function() {
  ReactDOM.render(
    <ExampleApplication elapsed={new Date().getTime() - start} />,
    document.getElementById('container')
  );
}, 50);
spalger commented 8 years ago

Happy to see this issue getting some discussion, but I'm not sure where it landed... Was this reopened because it's a bug linguist is going to fix?

cozos commented 8 years ago

@spalger Seems to be going into the backlog :P basically Linguist's regex engine can't handle language-babel's back referencing, which works in Oniguruma.

I would also like to mention to @arfon and @Alhadis that JSX syntax highlighting was working on Github probably around a month and a half ago - so perhaps there was some changes on linguist's end?

gandm commented 8 years ago

@spalger @cozos it would have worked about a month ago but then I refactored the import/export part of the grammar to use a certain back reference notation that oniguruma supports but linguist doesn't appear to. Actually, I've always used this form of back reference in other parts of the grammar that handled flow syntax but I guess no one has picked up on it not working as flow isn't widely used.

Maybe I could change the grammar to support something that linguist can use but I'm reluctant on two counts. One, it makes the code less readable and more unique for each regex, and secondly, I believe linguist should be text mate compatible and it isn't.

cozos commented 8 years ago

Why doesn't linguist just use https://github.com/gandm/language-babel/releases/tag/v2.24.4 then? @gandm, 2.24.x is the pre back reference notation version right? Seems to be a good compromise.

lparry commented 8 years ago

Is there a workaround for this? Some sort of comment/pragma mark we can put in jsx files to get them to highlight correctly and ignore the faulty autodetected language for the files?

cozos commented 8 years ago

I made a PR to version lock language-babel to an earlier release: https://github.com/github/linguist/pull/3091

lparry commented 8 years ago

Still broken on github PRs :/

Is there another issue I should be tracking for that

cozos commented 8 years ago

https://github.com/github/linguist/issues/3178

cozos commented 8 years ago

I guess we should close this one and continue discussion there? Up to the maintainers I guess.

tbillington commented 7 years ago

Still seeing broken highlighting in diffs for some sections of JSX. I'm a bit confused by the trail of issues and this seemed like a reasonable place to make a comment. Seems like the issue has gone a bit stale?

Apologies if this is the wrong place.

lewisblackwood commented 7 years ago

Is this resolved? It seems that highlighting only shows correctly for files with the .jsx extension - see example gist here.

I'm using Create-React-App, which supports but does not recommend the .jsx extension.

Edit: most recent discussion for the above is at #3677.

lildude commented 7 years ago

@lewisblackwood yup, once #3677 is merged, JSX files with a .js extension should start to be highlighted.

lewisblackwood commented 7 years ago

@lildude wonderful, thanks team!

lildude commented 7 years ago

If the syntax highlighting is then still broken, blame @Alhadis :trollface:

Seriously, we probably need to switch to using an externally maintained JSX syntax package as I don't think it's right nor feasible in the long run to rely on a package that isn't maintained by someone with intimate knowledge of the language. I realise this is easier said than done.

Alhadis commented 7 years ago

What are you talking about...? Who doesn't have an intimate knowledge of what?

lildude commented 7 years ago

Ooops, my mistake. :blush: For some reason I thought you were responsible for the fork at https://github.com/github-linguist/language-babel and were actively attempting to maintain it. I now see that's not the case as we're just using it to peg a version. Sorry about that. 🙇

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions.

jasongornall commented 6 years ago

bump this is still broken

pchaigno commented 6 years ago

@jasongornall As far as I can see, the bug reported in the original post is fixed. Do you have another case of incorrect highlighting? Maybe we should open a new issue to keep things clear?

rodgracas commented 5 years ago

@jasongornall Recently, I've also had the problem of having jsx in a .js file committed to a Github repository as a .js file. The code was highlighted in red and I thought it was a syntax error (missing semi-colon or some lint error..), but no. I was using single-quotes (') and I guess this broke the syntax.. I've removed and now is fine, with no red highlight.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions.

AndrewZamora commented 5 years ago

Unfortunately, I believe this is still an issue. A single apostrophe was the cause for me. This stackoverflow post helped me figure it out. As you can see in the commit below the jsx in my js file is being highlighted in red even though there is no error. https://github.com/AndrewZamora/Whos-On-What/commit/8f45eda43706d22b65891c972c9c1b841b51cb01 image

When I changed the file extension to .jsx the red highlighting went away as you can see here and in the image below. image

I changed the file extension back to .js and removed and apostrophe from the <h2> tag and the red highlighting went away as you can see here and below. image

I hope this was helpful.

lildude commented 5 years ago

Unfortunately, I believe this is still an issue. A single apostrophe was the cause for me. This stackoverflow post helped me figure it out. As you can see in the commit below the jsx in my js file is being highlighted in red even though there is no error.

This is going to be a problem with the grammar itself and not Linguist. You're using the .js extension so the https://github.com/atom/language-javascript grammar is being used. When you use .jsx, the https://github.com/github-linguist/babel-sublime grammar is used.

I know @Alhadis has been working on a grammar to combine support for both of these into the same grammar though I'm not sure of the status.

Alhadis commented 5 years ago

Ditch Atom's grammar, and switch to Babel's until I can get something battle-ready. There'll be no visible discrepancies between JS and JSX files (caused by different scope choices in either grammar), and all valid JavaScript can be guaranteed to be valid JSX.

lildude commented 5 years ago

Ditch Atom's grammar, and switch to Babel's until I can get something battle-ready. There'll be no visible discrepancies between JS and JSX files (caused by different scope choices in either grammar), and all valid JavaScript can be guaranteed to be valid JSX.

Hmm, this seems easier said than done. A metric 🚣‍♂️ load of other grammars expect to find source.js and they can't as soon as I switch the grammars resulting in the grammar checker reporting a ton of errors like:

Seems the grammar compiler isn't pulling in all of the files in the grammar so isn't finding the source.js scope defined in one of the files as we can see from the diff of grammars.yml:

diff --git a/grammars.yml b/grammars.yml
index 0ff1dcd9..73b18475 100755
--- a/grammars.yml
+++ b/grammars.yml
@@ -448,11 +448,6 @@ vendor/grammars/language-html:
 - text.html.basic
 vendor/grammars/language-inform7:
 - source.inform7
-vendor/grammars/language-javascript:
-- source.js
-- source.js.regexp
-- source.js.regexp.replacement
-- source.jsdoc
 vendor/grammars/language-jison:
 - source.jison
 - source.jisonlex

The removal of source.js.regexp also means the babel-sublime grammar itself can't be added without forcing:

$ script/add-grammar --replace language-javascript https://github.com/github-linguist/babel-sublime
Checking docker is installed and running
$ docker ps
Deregistering: vendor/grammars/language-javascript
$ git submodule deinit vendor/grammars/language-javascript
$ git rm -rf vendor/grammars/language-javascript
$ script/grammar-compiler update -f
Registering new submodule: vendor/grammars/babel-sublime
$ git submodule add -f https://github.com/github-linguist/babel-sublime vendor/grammars/babel-sublime
$ script/grammar-compiler add vendor/grammars/babel-sublime
  > latest: Pulling from linguist/grammar-compiler
  > Digest: sha256:159ff655c832de0e3ab3ea1366992606b5f604bcb64f8757b6c575d4a7628bc6
  > Status: Image is up to date for linguist/grammar-compiler:latest
  > The new grammar repository `vendor/grammars/babel-sublime` (from https://github.com/github-linguist/babel-sublime) contains 1 errors:
  >     - Missing include in grammar: `source.js.jsx` (in `JavaScript (Babel).tmLanguage`) attempts to include `source.regexp.js` but the scope cannot be found
  >
  > failed to compile the given grammar
$
Alhadis commented 5 years ago

Ah heck, that's bad. 😓 Didn't expect there'd be a circular dependency involved... that complicates everything.

@50Wliu, any idea if the Atom team would be averse to removing the rule which highlights unterminated strings as errors? Since Atom's grammars currently ship with both TextMate and Tree-sitter grammars (the latter of which Linguist ignores), I imagine the impact to users is minimal…

winstliu commented 5 years ago

@Alhadis that's probably fine. We don't actively maintain the TextMate variants anymore, but I think this is something we'll merge a PR for.