Add list of valid (or relevant) language identifier keywords for GFM syntax highlighting

jmm commented 9 years ago

I created this list of language keywords for enabling syntax highlighting and asked GitHub support to consider linking to it from the GitHub Flavored Markdown page. People who post issues without even marking up code are not going to be able to figure out the YAML file it currently refers them to. And for that matter, I don't know how anyone is supposed to understand it since there's no explanation of what values from it are relevant.

@gjtorikian advised me to submit a request to integrate that content into this repo.

I wrote a node program (it's pretty quick and dirty) to automatically generate the document from the languages.yml file. I made the document a wiki so that there would be a fallback in case it needed updating and I was unavailable. With the content in this repo it might make more sense to just check it in, to encourage updating it by running a script.

I looked around for a while and then gave up on figuring out where the logic is that defines the valid values. My theory is that it works like this (at least in part):

Any of the following (case-insensitive, with or without a . prepended) can be used as the keyword:

top level key, with spaces replaced by hyphens
aliases values that don't contain spaces
extensions values

That's how I believe it works so that's how I wrote my script, but frankly I think the document it creates is massive overkill. Take JavaScript for example. My script generates the following list of keywords to enable JS highlighting:

_js
bones
es6
frag
gs
jake
javascript
js
jsb
jsfl
jsm
jss
jsx
njs
node
pac
sjs
ssjs
sublime-build
sublime-commands
sublime-completions
sublime-keymap
sublime-macro
sublime-menu
sublime-mousemap
sublime-project
sublime-settings
sublime-theme
sublime-workspace
sublime_metrics
sublime_session
xsjs
xsjslib

For the purposes of enabling syntax highlighting in GFM I think there should just be js (maybe also javascript). It's probably too late to not support the others, but I think only 1 or 2 are relevant to include in a general reference list. The only value I can imagine in including the others is to leave the door open to provide more specific highlighting for them in the future.

jmm commented 9 years ago

Another thing worth keeping in mind is that npm renders GFM fenced code blocks with syntax highlighting in readme files. I got sick of trying to trace the logic for where that gets the language keywords from, but I got far enough to form the opinion that it's probably not working from the same list as on GitHub. I used one of my readme's as a guinea pig and I can observe the following:

Enables syntax highlighting:

js
```
var x = "X";
```
javascript
```
var x = "X";
```

Does not:

es6
```
var x = "X";
```
JS
```
var x = "X";
```
JAVAscript
```
var x = "X";
```
Js
```
var x = "X";
```
jS
```
var x = "X";
```

As you can see, they all enable highlighting here.

gjtorikian commented 9 years ago

I don't think I'm exactly giving trade secrets away, but the way we do highlighting is a simple:

Linguist::Language[lang_name] || Linguist::Language.find_by_extension(lang_name).first

I'm assigning this to myself as part of a documentation effort for this project.

arfon commented 8 years ago

@gjtorikian - what do you think is the best way to proceed here? I closed #2353 as it seemed that no-one was working on this but I'm definitely sympathetic to @jmm's goals here.

@jmm - It's not clear to me that this repo/project is the correct place for this documentation.

gjtorikian commented 8 years ago

I think we should probably list somewhere—maybe the README?—the key quote in #2353:

Generally speaking, the slugified version of a language name, as well as any of its extensions, defined the language code.

If we could automate the generation of the language codes, that'd be cool. We could also just link to @jmm's list: https://github.com/jmm/gfm-lang-ids/wiki/GitHub-Flavored-Markdown-(GFM)-language-IDs

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions.

pchaigno commented 6 years ago

Although it doesn't include the complete list of keywords, this will be covered by the new FAQ at #4271.

waldyrious commented 5 years ago

Although it doesn't include the complete list of keywords

@pchaigno IMO having a complete list in a human-readable format would be important to consider this issue resolved.

pchaigno commented 5 years ago

@waldyrious With the FAQ, you'll have the rules to determine the list of keywords for a given language. When is that not sufficient?

waldyrious commented 5 years ago

Even without the FAQ, the data in the yaml file (including the comments at the top), not to mention the code itself, already allowed people to figure out the exhaustive list of valid identifiers, with more or less work. The FAQ improves the situation, but only to an extent.

Especially in the context of the help page, I'd say it would make more sense to provide a plain list that people could use to look up valid keys, search for a specific identifier, etc., than offer a machine-readable file plus rules to interpret it. That's certainly sufficient, but it seems fair to say that most people would not find it very convenient.

pchaigno commented 5 years ago

@lildude If we write a script to generate the list of identifiers for each language, would it be possible to include it (or link to it) in the help page?

lildude commented 5 years ago

If we write a script to generate the list of identifiers for each language, would it be possible to include it (or link to it) in the help page?

Link? Probably, yes, though I'd need to pass it through our docs team first. Include? Probably not as the churn rate would be too high for our docs team to keep on top of it and I certainly don't want to be manually updating docs each time I make a release 😁 .

github-linguist / linguist

Add list of valid (or relevant) language identifier keywords for GFM syntax highlighting #2278