github-linguist / linguist

Language Savant. If your repository's language is being reported incorrectly, send us a pull request!
MIT License
12.27k stars 4.25k forks source link

TSX files are identified as TypeScript rather than JSX+TS #4359

Closed rainboxx closed 5 years ago

rainboxx commented 5 years ago

Preliminary Steps

I confirm I have...

Problem Description

Files with the extension .tsx (and code blocks with language identifier tsx) on GitHub are shown as TypeScript instead of TypeScript+JSX.

URL of the affected repository:

https://github.com/rainboxx/tsx-jsx-repro/tree/master

Last modified on:

2018-12-21

Expected language:

JSX, although I'm not sure whether there needs a new language that combines TS+JSX

Detected language:

TypeScript

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions.

rainboxx commented 5 years ago

This issue is still relevant. The issue can still be seen here: https://github.com/rainboxx/tsx-jsx-repro/tree/master

lildude commented 5 years ago

Ooops, I'm not sure how we missed this until now.

Files with the extension .tsx (and code blocks with language identifier tsx) on GitHub are shown as TypeScript instead of TypeScript+JSX.

I don't think we want to be changing this. As I understand it, JSX can be used with a number of Javascript-like languages and ${FAVOURITE_LANGUAGE} + JSX isn't really a language. Adding all these as pseudo languages will cause a lot more false positives between the various languages that use JSX too.

rainboxx commented 5 years ago

Hi @lildude, .tsx is the "official" extension when using JSX with TypeScript, see https://www.typescriptlang.org/docs/handbook/jsx.html. I'm not aware of other usages, a quick look revealed that CoffeScript is using .cjsx or sticks to .coffee, see: https://coffeescript.org/#jsx.

There seems to be a "Team Sports Scheduling System XML Project File" with the suffix tsx which is likely less prominent that tsx as TS+JSX.

I think it's the same way "not a language" as JSX itself is not a language. From my (limited) perspective I'd treat this topic as that if there's support for JSX itself it should also be supported within all the other languages. What do you think?

lildude commented 5 years ago

.tsx is the "official" extension when using JSX with TypeScript, see https://www.typescriptlang.org/docs/handbook/jsx.html.

Hmmm, now I'm confused. What problem are you pointing our or what are you requesting in this issue?

From your OP I took it that you were asking for Typescript files with JSX within them be identified as "Typescript + JSX" instead of just "Typescript", and thus would warrant an equivalent for "Javascript + JSX" (perfectly valid AFAIK) and any other Javascript-like language which can or do use JSX.

As it currently stands, .tsx files are already classified as Typescript as can be seen in your example and is inline with the sentence above.

.jsx files are currently identified as JSX but this language is currently grouped under Javascript, hence the second file in your example is identified as "Javascript".

I think it's the same way "not a language" as JSX itself is not a language. From my (limited) perspective I'd treat this topic as that if there's support for JSX itself it should also be supported within all the other languages. What do you think?

I'm pretty sure this already happens in some form or another as it stands.

rainboxx commented 5 years ago

.tsx is the "official" extension when using JSX with TypeScript, see https://www.typescriptlang.org/docs/handbook/jsx.html.

Hmmm, now I'm confused. What problem are you pointing our or what are you requesting in this issue?

Sorry for being confusing πŸ˜‰. I wanted to point out that the suffix is official because I interpreted your reply as "we shouldn't give tsx too much attention because it's arbitrary".

Your resumΓ© is straight to the point, except that I would clarify that "Typescript files" have the suffix ts while "Typescript files with JSX within" have tsx.

I'm pretty sure this already happens in some form or another as it stands.

Maybe I'm getting you wrong (not a native english speaker), but the current rules are:

The TypeScript syntax is highlighted in tsx files works. JSX inside it doesn't break the highlighting but it's also not highlighted correctly. I find it confusing that the relationship between js and jsx is different than between ts and tsx and expected jsx and tsx to work respectively.

I assume I'm just repeating what you already know πŸ˜‰.

Are you saying you don't want to implement a special sub-language TS+JSX because you'd also have to implement CoffeeScript+JSX? About false-positives: I could think of false-positives in combination with any language that embeds JSX without a special suffix (eg. CoffeeScript with .coffee) but can't imagine a false-positive with TS+JSX when the .tsx suffix is used.

lildude commented 5 years ago

The TypeScript syntax is highlighted in tsx files works. JSX inside it doesn't break the highlighting but it's also not highlighted correctly.

This is going to be a problem with the syntax highlighting grammar and not Linguist, and is very similar to a long running issue we already have for JSX syntax highlighting - https://github.com/github/linguist/issues/3044

I find it confusing that the relationship between js and jsx is different than between ts and tsx and expected jsx and tsx to work respectively.

That is indeed confusing and it's not likely to stand for much longer. We're likely to roll .jsx into Javascript at some point as we do for .tsx right now as...

Facebook (more specifically, Dan Abramov) decrees that the .jsx extension is no longer relevant since the pre-Babel days. His reasoning is pretty solid.

facebook/create-react-app#87 (comment)

... from https://github.com/github/linguist/pull/3677#issuecomment-440303127 and then further acknowledged by @Alhadis at https://github.com/github/linguist/pull/3677#issuecomment-440536127:

I'm not sure why we're even bothering to differentiate between JS and JSX anymore, actually. Since it falls under the usage statistics of the former, we may as well roll it into the JavaScript entry and spare ourselves the hassle over semantics.

Since JSX is a superset of JavaScript, it should be possible for one grammar to accommodate both JS and sugary extensions like embedded XML and type annotations. Moreover, Linguist has no way of knowing if something like this is a JS or JSX file...

The same arguments are likely arise if we split .tsx out from Typescript. We're also rethinking the language groups idea too.

Are you saying you don't want to implement a special sub-language TS+JSX because you'd also have to implement CoffeeScript+JSX?

Yes, but also because it shouldn't be needed if we stop treating JSX as its own language and start treating it as a sub-set of Javascript that it apparently is.

ghost commented 5 years ago

I'm totally in favor of bringing .jsx into JavaScript πŸ‘.

However, could there still be some plans to still support JSX highlighting inside of TypeScript? JSX, according to https://facebook.github.io/jsx/, is an "XML-like syntax extension to ECMAScript" and therefore not exactly a sub-set of JavaScript. Because of this the "syntax extension" can also be used with TypeScript.

This is a very valid use-case when using React in combination with TypeScript and I would really love to see this syntax change happen.

Sorry, commented with the wrong user, comment should've been created by @rainboxx

lildude commented 5 years ago

However, could there still be some plans to still support JSX highlighting inside of TypeScript?

I don't see why not, however this would be up to the grammar maintainer to implement, not Linguist (we only do the language determination), or for another grammar to be sourced that has JSX support.

rainboxx commented 5 years ago

Ok, so the whole ticket here is basically not relevant to linguist? And I should ask the grammar maintainer for TypeScript... Right?

lildude commented 5 years ago

Ok, so the whole ticket here is basically not relevant to linguist? And I should ask the grammar maintainer for TypeScript... Right?

Yup, essentially, but you'll please to know I'm in the process of making a new release for Linguist and it includes an update to the TypeScript grammar which from my quick local testing takes your sample foo.tsx from:

screenshot 2019-02-06 at 13 58 32

... to ...

screenshot 2019-02-06 at 13 59 28

Whilst not as pretty as the sample jsx grammar, it appears to be an improvement. If you want further improvements, then the grammar itself is the place to take your requests.

rainboxx commented 5 years ago

Thanks for letting me know about the change!

I quickly checked the grammar repo and saw that this grammar is also in use within VS Code. That application detects JSX inside of .tsx files pretty well:

image

(I fixed the sample code on GitHub as well but it still shows as before.) VS Code just highlights through TypeScript that react and component Foo is missing.

Any idea how to investigate this further?

lildude commented 5 years ago

@rainboxx you've really piqued my interest in getting to the bottom of this πŸ˜„

I've been experimenting with the TypeScript grammar in Lightshow and I think the issue here is the TypeScript repo actually provides four grammars with the two we're interested in being TypeScript and TypeScriptReact. These grammars do not source/include each other and as such are treated as different languages.

GitHub is using the Typescript grammar to highlight your foo.tsx as Linguist is identifying the language as TypeScript whilst VSCode is using TypeScriptReact as it is identifying the language as "TypeScript React". You can see this in the bottom right of your VS Code window:

foo tsx - typescriptreact

If I force the language to TypeScript I see the same highlighting as coming in the next release of Linguist:

foo tsx - typescript

You can see this in action in Lightshow too:

Linguist, and thus GitHub, doesn't know about the "TypeScript React" language and thus doesn't use this grammar.

I think there are only two solutions to this:

  1. Modify the TypeScript grammar to source/include the TypeScriptReact rules (I doubt this would be accepted upstream), or
  2. Add a new "TypeScript React" or similarly named language to Linguist and associate .tsx with that language so we can use the TypeScriptReact grammar. This could be grouped with TypeScript as we currently do for JSX.

I'm reluctant to do the latter for my previously stated reservations but it might be the only option.

@Alhadis @pchaigno I'd appreciate your thoughts on this.

pchaigno commented 5 years ago
  1. Modify the TypeScript grammar to source/include the TypeScriptReact rules (I doubt this would be accepted upstream)

I might be missing some context here, but why would option 1 not be acceptable?

  1. Add a new "TypeScript React" or similarly named language to Linguist and associate .tsx with that language so we can use the TypeScriptReact grammar. This could be grouped with TypeScript as we currently do for JSX.

As far as I remember (and understand), the issue with JSX is not so much with .jsx files but with .js files which may contain JSX code as well. If, here, .tsx files always contain TypeScript+JSX code, then we should be fine.

lildude commented 5 years ago
  1. Modify the TypeScript grammar to source/include the TypeScriptReact rules (I doubt this would be accepted upstream)

I might be missing some context here, but why would option 1 not be acceptable?

πŸ€·β€β™‚οΈ just putting it out there. The two a currently separate, probably for maintainability, so the upstream team may not be too keen on merging the two.

pchaigno commented 5 years ago

Would there be any drawback from using the TypeScriptReact grammar to highlight all TypeScript code?

rainboxx commented 5 years ago

Any update on this?

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions.

rainboxx commented 5 years ago

@lildude @pchaigno It seems this has been forgotten. Could we pick this up again?

Alhadis commented 5 years ago

I've not worked with TypeScript before (unless editing an index.d.ts file counts :p), but I hope to remedy that by starting a TSX project once my current project's is finished.

Then, when my slack arse gets back to rewriting the JS grammar, I can consider integrating TypeScript syntax along with everything else, making it a single-grammar which we can use for JSX, TSX, plain TypeScript, plain JavaScript, and so forth.

Will see what I can do. =)

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions.

rainboxx commented 5 years ago

Since @stale marked this as Stale is there anything I can do?

Alhadis commented 5 years ago

@rainboxx I second @pchaigno's earlier question:

Would there be any drawback from using the TypeScriptReact grammar to highlight all TypeScript code?

If there's no harm in using a JSX-enabled TypeScript grammar to highlighting ordinary .ts files, then doing so will easily fix this.

rainboxx commented 5 years ago

@Alhadis sorry, I skipped this question, which I thought was directed to someone else but I could've answered this as well...

Actually there could be an issue with the angle-bracket type assertion syntax, which is not allowed when using JSX. It looks like being an opening JSX tag and would most likely break syntax highlighting. I'm sure many people are switching to the as-style but there's still old code and people not using JSX that favor the angle-bracket style.

Alhadis commented 5 years ago

Damn. I see what you mean... πŸ˜•

I suppose in the short-term, we can add TSX as a language, but it'll be grouped under TypeScript (and be classified as such in statistics bars). As long as Microsoft don't start phasing out the .tsx extension in favour of just .ts (like Facebook did with the .jsx extension), there shouldn't be any problems with classification.

I describe this as a short-term solution because the long-term desideratum is to have a single grammar which can intelligently highlight both JSX and ordinary JavaScript. Doing so will involve some creative hacks, but I know from experience it's certainly possible. I just need to set aside some time first...

rainboxx commented 5 years ago

@Alhadis sounds like a plan! If there's anything I can help, let me know. Thank you!

Alhadis commented 5 years ago

Send me money.

Nah, I kid. πŸ˜€ Fix submitted in #4511.

Note that it's currently blocked on #4470, as the PR's body will explain.