github-linguist / linguist

Language Savant. If your repository's language is being reported incorrectly, send us a pull request!
MIT License
12.31k stars 4.26k forks source link

Custom Markdown syntax highlighting within locally specified code fences #2598

Closed fyngyrz closed 6 years ago

fyngyrz commented 9 years ago

The challenge / problem:

I have written a new language, yclept aa_macro, very simple, defined as follows, parens mean optional:

(content)[keyword( content)](content) -- square brackets denote a "built-in"
(content){keyword( content)}(content) -- squiggly brackets denote a user-defined "style"

Where content can contain any number of nested [keyword ( content)] and/or nested {keyword( content)} to any depth. Here is a very simple example demonstrating some of the kinds of nesting that can occur:

[style underline <u>[b]</u>]
This is [i [b a] test] of the
[color F00 [i Emergency {underline Broadcast} System]]

This generates HTML where the word 'a' is bold, italic; test is italic; 'Emergency Broadcast System" is italicized red; 'Broadcast' is also underlined:

This is <i><b>a</b> test</i> of the
<font color="#FF0000"><i>Emergency <u>Broadcast</u> System</i></font>

* (I can't show the actual result here, as Github's Markdown implementation does not allow either a span tag or a font tag that will set text coior, nor does it provide a Markdown text coloring syntax... the lack of general text coloring capability is problematic for me, and I suspect for others -- it would be awesome if that could be implemented...)

The aa_macro syntax coloring utilizes six colors:

  1. [ ] -- these delineate built-in capabilities
  2. { } -- these delineate user styles
  3. keywords adjacent to the opening [ of [ ]
  4. keywords adjacent to the opening { of { }
  5. non-keyword content such as "This is"
  6. HTML tags (everything from < to >)

I have this working in Textmate. For reference, here's a Textmate take on it vs. my current aa_macro syntax:

screen shot 2015-08-27 at 11 40 27 am

The suggestion:

`[style underline <u>[b\]</u>]`
`This is [i [b a] test] of the [color F00 [i Emergency {underline Broadcast} System]]`

If the Markdown processor doesn't find aa_macro.syntax, it then looks at the global language support just as it does now.

I hope this will be seriously considered. Right now, the whole "We try only to add new extensions once they have some usage on GitHub. In most cases we prefer that extensions be in use in hundreds of repositories before supporting them in Linguist." policy seems to be a showstopper for many Markdown syntax needs.

If none of the above can pass muster, then:

Please implement text coloring within Github's Markdown. Span, font, Markdown syntax, anything. I can make my language docs look exactly like they need to (with a whole lot of extra effort, sigh) with nothing but text coloring.

Closing

Thanks for reading. I know it was wordy, but I was trying to address all the issues on my side of the fence (heh heh) so this wouldn't turn into a long back and forth that was difficult to follow. You know, like documentation without text coloring... :)

jasuperior commented 8 years ago

I assuming the lack of responses means that this is an issue that no one cares about? I agree with the above stated completely and whole heartedly. I too have a programming language which i would like to syntax highlight for on my repo, but the lack of such a feature is preventing that, making my code samples just look like plain text files.

Your implementation seems plausible and easy. I wonder why no one is considering it.

pchaigno commented 8 years ago

This looks closely related to #2627 ;)

nmondal commented 8 years ago

This is totally needed. My own custom language is suffering the same fate... 👍

fyngyrz commented 8 years ago

pchaigno: To be fair, 2627 looks closely related to this. Inasmuch as this is 2598... ;)

But whatever, right? If they'd just get it done. Sigh.

kuzyn commented 8 years ago

I agree that this is feature that would be incredibly useful.

While they are different projects, how about offloading some of the burden to atom.io? What I have in mind is that instead of passing a local file to the highlighter, you could simply reference your syntax over on the package manager. Remove concern w/r/t/ to the syntax integrity and such

Alhadis commented 8 years ago

Atom uses First Mate for syntax highlighting, so I'd say that's what you're referring to.

I've hit this problem too - several times. I feel GitHub should have a whitelist of CSS classes to apply basic inline formatting. This would permit users to use <pre><code> … </code></pre> blocks to establish the same look as fenced code-blocks, but have proper control over the syntax highlighting.

E.g., this:

# Ignore the fact this is Perl, and pretend it's some unheard-of language
our $site = "needs this";

... might be implemented like this:

<pre><code>
<span class="code-comment"># Ignore the fact this is Perl, and pretend it's some unheard-of language</span>
<span class="code-keyword">our</span> <span class="code-variable">$site</span> = <span class="code-string">"needs this"</span>;
</code></pre>

Yes, this is cumbersome to write. But for authors who're serious about having code presented readably and cleanly, it's better than absolutely nothing.

This would also allow true diff-style highlighting within GitHub-flavoured Markdown (green and red backgrounds, etc).

kuzyn commented 8 years ago

Mmm that's interesting. Totally agree that, while an in-between solution, CSS styling would still be incredibly useful

fyngyrz commented 8 years ago

If you have to rewrite your code with css interleaving, you're going to be very busy doing makework. At that point, there are simpler ways, primarily, just put the code up in an editor that supports the syntax highlighting, screen capture it (in pages if need be), and post the image(s.) This makes sure you're capturing code that works and the syntax highlighting is done the way you specify.

It's a huge PITA, and one github could eliminate easily which is very frustrating, but it gets the job done.

Alhadis commented 8 years ago

I don't feel it's an in-between solution. It wouldn't be hard for somebody to write a program that used a TextMate-compatible grammar and generated a chunk of marked-up code to insert in a GitHub comment.

fyngyrz commented 8 years ago

It's reinventing the wheel. Github could -- and should -- solve this in the obvious way. Why? Because should a language like this take off, it would be nothing less than awesome to have the syntax highlighting grammar 100% ready to go.

kuzyn commented 8 years ago

It would be in-between as in it is not a true support for custom syntax

Alhadis commented 8 years ago

I feel it's going about things the wrong way, and still only solving half the problem of having NO support for inline formatting.

Github could -- and should -- solve this in the obvious way.

How is this proposed solution obvious?

fyngyrz commented 8 years ago

Alhadis, I see it this way:

Which is not to say that github shouldn't provide for usable HTML and CSS beyond the crippled subset it allows now -- there are many valid uses that are not code-centric for such facilities.

Alhadis commented 8 years ago

Syntax highlighting is set up once. After that, just write your code. It's highly efficient.

Efficient if you're writing documentation for a language, yes. Here're some of the caveats:

All-in-all, it's strikes me as a very kludgy solution that, as I said, only solves half the problem: and only does so as far as language authors are concerned.

fyngyrz commented 8 years ago

Before I address your bullets, I should point out that the only reason this idea arises in this form is because Github won't accept a grammar until the usage of that grammar is relatively widespread. Presumably they have some reason for this. I don't know what that might be, but the fact is, they won't. This leads directly to the problem that a new or isolated language encounters significant documentation challenges that otherwise would not exist. So the objective here is to offer a means to get around Github's penetration metric, using the existing infrastructure as much as possible such that any effort expended by both the authors and end-users of the such languages is forward-compatible with wider use of the language. That is not to say that it is a zero-effort proposal.

As existing (site-wide) grammars can be changed (and are), a mechanism for updating documents that utilize existing grammars already exists. It seems likely it could be used as-is, or very close to as-is. Doesn't seem like an actual problem, or something likely to be an actual problem.

No reason it has to be there for end-users -- that's an implementation detail up to Github. However, if you're the author writing a language, I doubt this would be seen as a disadvantage. Something that could be tweaked and extended just as everything else can by your co-project fellows on the one hand, and readily accessible to you on the other. That's the whole point of such a project. And syntax highlighting is a subject critically related to language development, so in terms of should it be in a repo for a new language or not... I'm comfortable with "yes, absolutely."

Again, that's an implementation detail, not a requirement. It's up to Github how it's handled. Even if that's the case, though, copying a file as opposed to trying to use CSS and HTML freehand to document every use of a language when you're trying to show code or other use... that's continuous makework. Copying a syntax file... very easy. Very enabling. Further, if you have a copy that works for you, then you don't even have to update it unless you want to. And if the language achieves the penetration Github requires to support a global syntax file... delete the local file, and the fence starts working with the global one.

You realize we're talking about language authors writing the syntax for their own languages, right? Specifically people who are highly motivated to create that specific syntax highlighting? A downstream user of such a language would, at most, find it useful to copy a syntax file to their repo.

I fail to see this as even relevant. It's part of the process of creating the grammar. Just as writing the grammar is. How is this an issue at all?

Alhadis commented 8 years ago

Okay, yeah, you're right. I bought up some silly points, because I sense you're an argumentative type who probably wouldn't respond well to what I wanted to write. I'm just going to rip the band-aid off right now:

You shouldn't be using GitHub to document your language in full. Get a site or use GitHub pages if syntax highlighting is so damn important to you.

I can't for the life of me imagine how many other language authors are as strung-up on this issue as you are.

Alhadis commented 8 years ago

Furthermore, even if it's a convenience for some language authors who'd prefer the syntax highlighting, I seriously doubt they're using GitHub as strictly a means to document their language. Seriously.

kuzyn commented 8 years ago

Most use cases have nothing to do with documentation in full. Here is our situation: we have a gigantic codebase in ChatScript, a wide spread NLP scripting language that is not prominent enough to be considered by linguist. We're creating a best practice document, which is in markdown and committed to the repo for easy consultation & editing by our organization. It goes without saying that the style guide should be part of the project's repo. Our example code is not styled as it should be; surely you can understand how inconvenient it is? How about just navigating our codebase on Github? Thousands of lines of non-styled code?

Furthermore, resorting to yet another tool, having to maintain a second project to host and highlight code for a something like style guide is redundant and impractical.

I think that the real issue is to open up support for more languages in linguist, not debate as to whether it make l sense to have projects that are not JS or C++ on Github

Alhadis commented 8 years ago

The solution I suggested (having a list of permitted CSS classes to use for such things) would cover the Markdown issue. It'd be trivial to write a solution that, as I said, took a given TextMate grammar and used to to generate a properly marked-up code block for use on GitHub.

I'm not saying this isn't an issue. I'm saying there're easier ways to solve it.

fyngyrz commented 8 years ago

Alhadis, you didn't suggest an easier way to solve it. The suggestion here is:

Your counter-suggestion is:

Both ways require Github to extend their systems.

Your way is (a lot) harder for the user of github. That means every time such a thing is required, someone has to work harder.

The proposed solution is harder for Github (probably... maybe not) and makes it much easier on the user of Github.

Also, yes, of course, if you can't doc on Github, you doc elsewhere. Which is exactly what I do. Thereby splitting my project between github and elsewhere, which is not in any way optimum for maintainance or end-user convenience.

As for the personal remarks, please stop that. I'm not going to engage on that level.

kuzyn commented 8 years ago

https://github-lightshow.herokuapp.com/

This is interesting since it can load any remote syntax files and require minimal effort: just link to your existing JSON or PLIST or CSON file. Such a functionality should be implemented for private repos

Alhadis commented 8 years ago

Your way is (a lot) harder for the user of github. That means every time such a thing is required, someone has to work harder.

  1. Copy+paste code in a webapp like Lightshow
  2. Enter URL of remote syntax definition
  3. Copy HTML markup to enter in Markdown.

That's an extra few steps, at most.

Both ways require Github to extend their systems

Only one resolves a broader issue.

Alhadis commented 8 years ago

Finally, if I WAS to suggest a solution like this, it'd be to have a new repository setting that pointed to the URLs of one or more custom grammars to use for highlighting.

fyngyrz commented 8 years ago

Alhadis, there is more than one issue here. If you'll read my original remarks, as well as others here, you'll see I am in no way against broader HTML and CSS support. And yes, the issue of unduly limited HTML and CSS is one of significant import, quite aside from what I'm talking about here. And yes, it could indeed be used to cripple around the problem of no syntax support by imposing extra demands on the end user. And yes, if they implemented it without implementing the syntax support I'm asking for here, I'd probably use it that way in order to keep the projects all in one maintainable gitified lump.

But for my use case, which requires syntax highlighting of a lot of language elements, examples, references and so on that would be interleaved with markdown in very complex ways, it would be most useful to me, which is to say require the least effort from me, to specifically have syntax support.

Nothing wrong with what you're asking for. I agree completely it's a good idea. But it's not a good replacement for the idea here.

Finally, if I WAS to suggest a solution like this, it'd be to have a new repository setting that pointed to the URLs of one or more custom grammars to use for highlighting.

Yes. Wonderful. Great idea. :+1:

fyngyrz commented 8 years ago

Alhadis:

That's an extra few steps, at most. (copy, paste, use app like Lightshow)

Doesn't really work out that way. A document may have many uses of the custom syntax within its normal context, in fact, mine do. The examples don't do nearly as well as they could because there's no highlighting. Consider how the same language's details look on this page where I am able to properly style the language elements.

Consider documentation for a language, where each language item written about in markdown has one or more isolated examples of use. You'd be going to the app and copying / pasting over and over again.

It's not just about a syntax highlight for a file written completely in some form; it's about documenting that form as well. And in that circumstance, this is pretty much ideal:

```myLanguage
stuff in myLanguage
shanebdavis commented 7 years ago

I want to add my vote for how important this feature is for language developers like me. The lack of github syntax highlighting is a serious obstacle for new language adoption.

Syntax highlighting is essential for understanding code.

It is particularly important for new languages:

GitHub should make new language support a priority. The development of new languages is important to the coding community. Languages define what we can express. They are the foundation of absolutely everything we do, and there is so much more out there beyond the current crop of languages. There is so much we can't express because we don't have the languages to do so.

shanebdavis commented 7 years ago

I want to thank all participants of this discussion for covering all the pros/cons of the two solutions in detail. It seems that both solutions could be done. They aren't mutually exclusive. I think it comes down to:

My conclusion:

(*) Particularly if they have literally dozens or hundreds of code-fences, as I already do. (**) Users could trivially use the custom syntax with a one-line URL reference to the syntax-highlighting-definition file. Solves the "update" problem.

TurkeyMan commented 7 years ago

So, can we have comment from a Github dev? This issue is really making my project hard for me.
I was actually quite surprised what is proposed here isn't possible. I totally assumed there would be a solution for a local grammar...

shanebdavis commented 7 years ago

It occurs to me that there are client-side syntax highlighters. There are probably even ones which take textmate syntaxhighlight definitions and are open-source. All github needs to do is adopt one of those and we'd be good to go. People could just add a custom entry to the spec file in their repro which could configure client-side syntax highlighting for READMEs, wiki and files.

How can I help?

fyngyrz commented 7 years ago

The usual problem is that such applications are platform-specific, whereas the web browser is at least somewhat platform-agnostic, or can be made to be. Besides, everything Github needs to make this work is already in place; it's simply a matter of not disallowing use of the existing syntax highlighter, and providing a means to activate it for low-use roles in repos that want it so activated.

The problem here isn't a technical one. It's a political one, that of getting Github to do what we need.

shanebdavis commented 7 years ago

I mean a 100% javascript, runs in browser solution.

On Tue, Mar 21, 2017 at 10:28 AM Ben notifications@github.com wrote:

The usual problem is that such applications are platform-specific, whereas the web browser is at least somewhat platform-agnostic, or can be made to be. Besides, everything Github needs to make this work is already in place; it's simply a matter of not disallowing use of the existing syntax highlighter, and providing a means to activate it for low-use roles in repos that want it so activated.

The problem here isn't a technical one. It's a political one, that of getting Github to do what we need.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/github/linguist/issues/2598#issuecomment-288135741, or mute the thread https://github.com/notifications/unsubscribe-auth/AADZTO2nS5vhIdT05BEhRg0mZhNW5Jhwks5rn_qwgaJpZM4FzhCx .

fyngyrz commented 7 years ago

Ah, sorry. I'm still a few parsecs away from calling Javascript web pages "applications." I call them "web page scripts", but that's just me.

Well, again, Github already supports syntax highlighting. If you can get them to support another method, rather than just tweak access to the one they're already using, we're good. But I don't think it's likely, as it seems like it'd be more work for them.

And seeing as this need has been exposed to them for years now, and they have steadfastly ignored it... I think we're beating the severely dinged-up bones of a completely decomposed horse here. Sigh.

Alhadis commented 7 years ago

Ah, sorry. I'm still a few parsecs away from calling Javascript web pages "applications." I call them "web page scripts", but that's just me.

You're living in 1995.

But hey, I consider Python a markup language with bloat attached, so who am I to judge? 😁

fyngyrz commented 7 years ago

You're living in 1995.

Nah, I'm just living in a world where RF signal processing in real time is the primary task for which I build applications. Almost every programming language is too slow; web pages are decades away from getting done what I can get done in C and assembler. I write image processing too; and while that can be done while waiting, it's annoying, so again I end up using C and assembler.

Python is fabulous for low-compute power needs. As is Javascript, for that matter. They're both interesting and useful scripting environments. And of course there are tons of things that only have low-compute power needs. Which is fine. But my milage really does vary here.

Alhadis commented 7 years ago

Web pages could be insanely faster if people, you know, quit using frameworks and actually took a direct approach.

But hey, that's why I'm unemployed. Can hand-code state-driven tokenisers, but I don't know/use React, so I'm useless. =)

TurkeyMan commented 7 years ago

This thread is more important than this OT tangent...

TurkeyMan commented 7 years ago

Can someone from Github please get in and comment here already? >_<

midn1 commented 7 years ago

A comment'd be bloody useful after 2 years.
Somebody needs to start a petition

kravemir commented 7 years ago

+1 for this feature request!

hristo-vrigazov commented 7 years ago

+1, this would be great!

dunstontc commented 6 years ago

+1

dhowe commented 6 years ago

+1

lildude commented 6 years ago

Can someone from Github please get in and comment here already? >_<

👋 Sorry about the lack of response from GitHub... I'm not entirely sure how this issue has evaded me for so long.

This is an interesting idea, but not something that could or would be implemented within Linguist.

The initial request for custom Markdown syntax highlighting within locally specified code fences would need to be implemented in https://github.com/github/markup and @Alhadis has already started the ball rolling there by opening https://github.com/github/markup/issues/905.

Custom syntax highlighting of the files within a repo on a per-repo basis would need to be implemented within the main GitHub application.

This is where the biggest problem comes and likely to be the main reason why this never sees the light of day: in order to do this, GitHub, either server-side or client-side, would need to pull in the per-repo grammar from a file within that repo and use it to render the syntax highlighting. This is incredibly dangerous and open to abuse. It may also require the main GitHub site implementing the grammar compilation functionality we use when we build a new release - we precompile the grammars into JSON so GitHub doesn't have to. This in turn will likely have an impact on the performance of the main website when it comes to rendering custom repo-specific syntax highlighting.

And that's before we even get onto the legal ramifications of GitHub pulling in and "running" third party code that may potentially be licensed under a restrictive license.

Linguist currently acts as the gatekeeper to minimise the chances of these things happening right now.

That all said, I have opened an internal feedback issue with this suggestion referring back to this discussion (and https://github.com/github/linguist/issues/2627) for our main site developers to consider.

As this is not something Linguist has any control over, I'm closing this issue. Thank-you for the great OP @fyngyrz.

fyngyrz commented 6 years ago

This is where the biggest problem comes and likely to be the main reason why this never sees the light of day: in order to do this, GitHub, either server-side or client-side, would need to pull in the per-repo grammar from a file within that repo and use it to render the syntax highlighting. This is incredibly dangerous and open to abuse.

Good grief.

If this claim is true (and I honestly doubt it) then you have a github code problem, not a markup grammar problem. It's a markup grammar like HTML for crying out loud. Not a programming language like Python or c. And frankly, if your markup grammar is not safe in the first place, you have no business using it at all.

Any competent programmer could make a safe markup grammar parser or pre-filter if the grammar was so terribly designed as to be dangerous. Your claim requires us to believe that github has no such programmers. I don't believe it.

Furthermore 90% of what is needed could be done with simple HTML. Bold, italic, color, fixed and prop fonts, indentation... to which one should add keyword and number matching, dealing with quoted strings and escaped characters and bracing. We're talking about marking up code for display, not writing obfuscated c with active components, FFS. So there's really no excuse at all outside of lazy and/or incompetent.

If you can't do it, hire me, I'll do it, and it'll bloody well work.

So frustrating trying to get a sane answer from github. And look, the issue was closed. FFSx2.

Minenash commented 6 years ago

I know this is closed because

this is not something Linguist has any control over

But as I also don't know where else to post this, where others can easily find and comment, I'll put it here.

It wouldn't only be for language developers, but also for people / organization that made a file similar to json, xml, or yml for configurations, settings, ect. Which is probably a lot more common then full on languages. And GitHub just mislabels it. While having syntax highlight for things like this would be awesome, (and is needed for language devs), at the very least, could GitHub allow us to label certain file extensions as a Language (including custom), with or without syntax highlighting?

While I'd love to see syntax highlighting / grammar right now, maybe at least taking one step in through the door, might motivate the GitHub devs to fully finish it. And if not, at least it's something.

CAD97 commented 6 years ago

@Minenash you can already use .gitattributes to manually set the linguist-language of any file in a repository. If you're just trying to highlight a file as YAML, JSON, XML, or another language that linguist supports, then that's how you do it.

This issue is for language support for languages not supported by linguist on a by-repo basis.

Minenash commented 6 years ago

I was talking about custom languages. I said files similar to yml, ect.

For my org's case, we have a custom file with the extension .cmd (Command File). This doesn't exist in linguist, because we made the syntax.

d-sfounis commented 6 years ago

Jesus, this whole thread was an experience.

fcurts commented 4 years ago

Our (thousands of) GH Enterprise users increasingly reject in-house file formats just because they aren't syntax-highlighted on GH Enterprise. This causes incredible pain. Please oh please give us a way to add our own syntax highlighters. (We have IDE plugins and even tree-sitter syntax highlighters, but users expect syntax highlighting to work in GH PR reviews.)

lbfalvy commented 2 years ago

How is it that Github's syntax highlighting was completely replaced with an open-source solution since this issue was opened and we're still not even close to a solution?