Closed Alhadis closed 4 years ago
Analogous to the SCSS-CSS argument, Pug (Jade) and other templating languages that are radically different from HTML should also be considered, as they currently fall in the HTML group.
Agreed. Personally, I think most (if not all) templating languages should be decoupled from their target output. There's a reason they're templating languages, after all... and it isn't "just HTML" if I open a Pug template in a browser and see a weird mix of half-empty tags and loops. 😉
I think if a parent language is unambiguous and well-specified (as HTML and CSS are), a child language should be either a subset or a hybrid of different languages. Conversely, Assembly and Shell are umbrella terms of sorts which cover numerous dialects and implementations, so having them as parent languages makes more sense, IMHO.
This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions.
I think this is a pretty good list to start with and I doubt we'll be able to have a comprehensive list (we don't know and use all languages Linguist supports ourselves). To move forward, should we agree on a short guideline to decide if languages should be grouped in the future (so that we can better handle future cases we missed here)?
I think that comment by @Alhadis is a pretty good starting point:
I think if a parent language is unambiguous and well-specified (as HTML and CSS are), a child language should be either a subset or a hybrid of different languages. Conversely, Assembly and Shell are umbrella terms of sorts which cover numerous dialects and implementations, so having them as parent languages makes more sense, IMHO.
@lildude What's your opinion on this?
I've pulled JSX from the list. For a start, it's not as clear-cut as TypeScript is (Flow typing and JSX tags both fall under the umbrella of "JSX", more or less). Plus the distinction itself is problematic for reasons I've explained here.
Whoops, lost this in my inbox at some point and was just reminded by @Alhadis in https://github.com/github/linguist/issues/4353.
I think that comment by @Alhadis is a pretty good starting point:
I think if a parent language is unambiguous and well-specified (as HTML and CSS are), a child language should be either a subset or a hybrid of different languages. Conversely, Assembly and Shell are umbrella terms of sorts which cover numerous dialects and implementations, so having them as parent languages makes more sense, IMHO.
@lildude What's your opinion on this?
Seems reasonable to me.
Languages which should be degrouped
... as does this. Do it.
... as does this. Do it.
I'm gonna enjoy this...
Just an FYI: this might take a while because of conflicting colour proximities. 😅
Just an FYI: this might take a while because of conflicting colour proximities.
We might be able to get rid of that soon 🤞 I had a chat with a colleague and your suggestion at https://github.com/github/linguist/pull/4331#issuecomment-443419513 may become a thing 🔜.
Holy shit. 😮 🎉 🎉 ❤️
Guys, I've pushed a WIP branch for the degrouped languages I'm familiar with, but I'll hold off from submitting a PR until some time has elapsed (or until the potential changes have been reified).
In the meantime, feel free to push any changes you think are missing or necessary. 👍
... of course, when pushing topic branches, it'd help if I actually had commits to go with them.
Remind me not to leave changes staged for several hours, because my crap memory will have me believing they've already been committed. 😁
@lildude I realised another reason why the colour-proximity thing is strangling us — Language authors gravitate toward vibrant colours when deciding their project's logo/branding/colour-scheme. So over time, more and more languages will be added to Linguist with clashing colours: bright red, dark blue, bright blue, purple, warm yellow, etc.
So the remaining "available colours" we can assign them will inevitably be sickly shades of pale green, washed out red, white-ish pink, etc. The current constellation of colour choices is already proving this: when adding Asymptote, I noticed its official colour was #FF0000
(bright-red). That clashed with PostScript, Mercury, Red (the language, lol), Ruby, and several others which were likely "pushed" away from their official colours shades due to the colour-proximity requirements.
Having said that, there's no way I'm gonna add 12 uncoloured/grey languages that were degrouped from their parent languages, most of which have branding with vibrant, distinctive colour choices. Nor do I want to drop 12 grossly inaccurate colour-choices into the language bar to represent Less, SASS, etc.
@Alhadis I hear you, and hopefully we can remove this once https://github.com/github/linguist/issues/4291#issuecomment-447378743 happens. It's on a team's radar, just need to see it come to fruition.
Especially for you @Alhadis 😘
This is the happiest day of my life, holy shit. 😀
What should we do about the colour-proximity check?
Would this mean existing languages will be able to get their official colour after the proximity changes now that there's a separator?
Yes!
Should we keep some semblance of color proximity detection though? If only to prevent all colors from becoming blue... I was thinking we could simply relax our proximity constraint?
If only to prevent all colors from becoming blue...
That's a non-issue, and only likely to be noticed in repositories which contain multiple languages that incidentally use almost identical colours.
repositories which contain multiple languages that incidentally use almost identical colours
Isn't this a kind of birthday paradox and therefore the probability of that happening is actually higher than one might expect? :p
@Alhadis Do you think we should remove the constraint on colors entirely?
Yes, I do. There's no good reason for policing colour choices anymore, and I can't see any reason why confusingly-similar/adjacent colours in a language bar could pose any sort of a problem.
Also, it enables us to restore colours to data/prose formats, which is relevant now that we have the language-detectable
attribute. Users who override it are seeing grey bars.
Perhaps you're right and we should just get rid of it. It certainly would be nicer for contributors. I'm just a bit wary of making a change that will be hard to rollback, without knowing how language colors will be used in the future on github.com.
/cc @lildude What's your opinion? Any reason we shouldn't get rid of that color proximity test?
without knowing how language colors will be used in the future on github.com.
That's up to GitHub's design team to decide and deal with, not us. 😉 Designers are used to working within constraints (the limitations imposed on the use of a company's logo is more burdensome than making random colours stand out).
It's also impossible to justify the colour-proximity check to future contributors who don't expect a new language's official colour to be "taken". And the logic for discerning similar colours was far from infallible to begin with...
@lildude Any official word from GitHub concerning the removal of the colour-proximity tests?
@lildude Any official word from GitHub concerning the removal of the colour-proximity tests?
Ooops. Not yet as I forgot to open the issue 😊. Just opened an issue seeking feedback from our design team. Will let you know more when I do.
This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions.
Hey y'all 👋 Thought I'd add my thoughts on this too.
After using Ruby .erb
s in a number of non HTML ways (like code generation), I wonder if we should consider reclassifying it? What options are available?
CC: @Alhadis @pchaigno
Could you post a code sample of what you mean? I can't fathom how HTML markup could be making code generation easier...
In any case, unusual use-cases of a language benefit from a linguist-language
override for the affected files... =)
.erb
s just seem like a means to template text (of any kind). There doesn't appear to be anything HTML specific about them.
ERB filenames have a preceding filetype in their name. A HTML ERB would have the filename name.html.erb
, while a Java ERB would have the filename name.java.erb
, and a conf ERB would have the filename name.conf.erb
.
It seems like we could infer the language type of an ERB file from it's preceding filename. How do you feel about this? 👍 👎
Take a look at these Java templates in Shopify's Java graphql generator: https://github.com/Shopify/graphql_java_gen/tree/master/codegen/lib/graphql_java_gen/templates
Here GitHub Classroom uses an ERB to template an nginx config: https://github.com/education/classroom/blob/master/config/dev/nginx.conf.erb
Heres a search of all the times a Dockerfile was templated with ERBs: https://github.com/search?utf8=✓&q=extension%3Aerb+filename%3Adockerfile&type=Code&ref=advsearch&l=&l=
Ah, I see. So it's really more of a generic templating system that (naturally) lends itself well to server-side HTML rendering? If it isn't HTML-centric, it might make sense to rename it to Embedded Ruby
instead (as well as degrouping it).
However, that'd still be of minimal benefit to syntax highlighting and language classification. Because Linguist is limited to classifying languages that've been registered ahead of time, it'd be impossible to classify files as, say, Java+ERB
, INI+ERB
. So, the best we can do is rename it to something more appropriate and/or make it a child-language of Ruby.
I'm really not the right person to be discussing anything Ruby-related, though. Since I've no knowledge of what ERB files are really used for, I can't confidently assert my suggestions are suitable (are these code-generation cases only 10% of ERB-using repositories? A third? ~50%?). @lildude would be right person to ask about this, but since he's currently @busydude, it's probably safer to leave this matter be for now. =)
The idea of renaming it to embedded ruby –a child language of ruby seem acceptable to me.
I would speculate that HTML+ERB
is the most dominant variant due to the popularity of Ruby on Rails. Should the HTML+ERB
variant stay classified as an HTML like language, since HTML rendering is still a major use case for ERBs?
Yes, I think so. Exceptions can always use a .gitattributes
override to flag it as another language (which affects syntax highlighting too). Granted, this means they're limited to either Ruby or whatever language is being templated... but it's better than (mis)classing it as partly HTML
.
I don’t mind proposing the change in a PR :smile: Are there other PRs which did something similar? Pointing me to another PR would help me get a head start!
Renaming a language is a simple procedure (though that wasn't always the case…). You can use #4171 as an example. Basically, it's just:
./lib/linguist/languages.yml
. Then,
group: HTML
field.
If you want the entry to contribute to the usage statistics of Ruby, replace the line with group: Ruby
instead.samples/
directory: ./samples/Old Name/
→ ./samples/New Name/
./lib/linguist/heuristics.yml
(only the language:
fields require updating)./test/*.rb
(some tests single out specific languages for assertion)bundle exec rake samples
to update samples database.script/list-grammars
to regenerate the grammars list
.bundle exec rake test
to run Linguist's test suite. Anything you've missed will display loud hairy feedback: you'll know when you've covered everything. 👍I just realised we could always keep HTML+ERB
and add Embedded Ruby
as a separate language. The extensions
of HTML+ERB
could target .html.erb
and .html.erb.deface
, whilst the new Embedded Ruby
language could simply target .erb
more broadly. This is much more complicated, and would necessitate the addition of heuristics and regression tests to disambiguate... however, this feels to me like it might be the winning solution.
Again, I'd wait for @lildude's input before rushing off to submit a PR. Should my solution be found preferable, well, your PR will have been in vain. 😉
I just realised we could always keep HTML+ERB and add Embedded Ruby as a separate language. The extensions of HTML+ERB could target .html.erb and .html.erb.deface, whilst the new Embedded Ruby language could simply target .erb more broadly. This is much more complicated, and would necessitate the addition of heuristics and regression tests to disambiguate... however, this feels to me like it might be the winning solution.
I agree with this 👍 I'll wait on @lildude input before tackling this.
Svelte should be removed from the HTML group. It's similar to Vue which is already on its own.
The language Sage is really built on top of Python and their syntaxes are almost the same, but they are not 100% equal. For example, y^3
computes the cube of y
in Sage instead of the y XOR 3, as in Python.
Moreover, R.<t> = QQ['x']
is a valid line of code in Sage, while in Python it raises a SyntaxError.
Besides the syntax, the other features are very different. For instance, Python treats mathematical expressions numerically, while Sage treats them symbolically, thus, 1/3 is 0.3333 in Python, but it is a fraction in Sage, and sqrt(2) is 1.4142 in Python, but it is, well, sqrt(2) in Sage.
Even if one has never declared x anywhere, the following is a valid Sage script which prints -1:
sage: f = cos(x)
sage: f(x = pi)
-1
All that said, I would like to invite you to consider degrouping Python and Sage.
@hilder-vitor You should submit a pull-request to degroup them; this thread is chiefly for discussing languages whose "independence" is ambiguous and open to debate. There's clearly no ambiguity or room for subjectivity in what you've described.
@ObserverOfTime I missed your comment when you posted it. I've added Svelte to the list.
This issue is a continuation of what @pchaigno started with github/linguist#3093:
There are several languages on GitHub which presently fall under the usage statistics of another, "parent" language, which certainly deserve reconsideration — or at the very least, some public discussion for highlighting the reasons why they fall under another language's umbrella.
To start, here are the languages which I believe are valid candidates for degrouping. I'll extend this list over time as discussion from other users confirms other candidates:
Languages which should be degrouped
JSONFixed in #4345Moreover, there's little point in retaining a connection with JavaScript. JSON is classed as a data language, so it won't appear in usage statistics anyway.
I've refrained from bringing up any languages I've never worked with or lack familiarity with (such as PostCSS and Stylus), each of which might be candidates as well. Comments are welcome.
Good examples of language groups
Here are some languages which are justifiable in having a parent language:
HTML+…
languages (HTML+Django, HTML+ECR, HTML+ERB, etc)/cc @pchaigno, @lildude, @controversial, @nazar-pc, @EmmaRamirez, @plibither8
Footnotes
Pic is interesting because it can be compiled to other languages that aren't Roff (like SVG or TeX), but the language itself is based upon Roff syntax and even permits low-level Roff constructs to be used inline. In other words, it's not so cleanly separated, and demonstrates why transpilation targets are fallacious reasoning w.r.t. whether Sass should be distinguished from CSS or not.