Closed DanTup closed 2 months ago
I'm pretty sure this is an issue with the grammar attempting to interpret these blocks and not ignoring them as comments. Other grammars don't attempt to intepret content in comments.
Here's Ruby which is using the treesitter grammar directly in the syntax highlighting engine:
# this is comment
# ```
# this is code
# ```
# this is comment
def foo
# this is comment
# ```
# this is code
#
# this is comment
def foo
And Shell which uses an old Textmate grammar shipped with Linguist:
# this is comment
# ```
# this is code
# ```
# this is comment
def foo
# this is comment
# ```
# this is code
#
# this is comment
def foo
I believe the Dart grammar should be doing the same but @Alhadis is more knowledgable with dealing with the intricacies of GitHub's old syntax highlighter than me.
I'm pretty sure this is an issue with the grammar attempting to interpret these blocks and not ignoring them as comments. Other grammars don't attempt to intepret content in comments.
I'm not sure I understand - although I used comments as an example, I don't think that semantics of what a comment means is relevant here. I could just as easily have used some other marker that could be some concept made up by my language, and not a "comment".
I think the question here is just whether begin/end
captures should be allowed to break out of while
captures. If so, then I think there's a bug in VS Code's implementation, and if they are not, I think there's a bug here. I can't find anything concrete to satisfy me one way or the other, but I'd really like GitHub and VS Code to agree before I try to change the Dart grammar because I might end up going in the wrong direction.
Ah. Sorry, I have no idea. Maybe @Alhadis knows.
For what it's worth, the old syntax highlighter that GitHub uses for Textmate-based grammars (called prettylights - it's not public) is no longer being maintained so if GitHub is wrong, it's been this way for a very very very long time and won't be changing.
For what it's worth, the old syntax highlighter that GitHub uses for Textmate-based grammars (called prettylights - it's not public) is no longer being maintained so if GitHub is wrong, it's been this way for a very very very long time and won't be changing.
Ah, good to know - I wasn't familiar with exactly what's used where. If it has to remain broken on GH because of a bug in prettylights, so be it - but I don't want to try to fix it here and break VS Code in the process without being sure it's VS Code at fault (and therefore it might be fixed there). Thanks!
from what I last heard The TextMate engine has not been archived and is still taking bug reports https://github.com/github-linguist/linguist/issues/6668#issuecomment-1967530241
I'm not sure if this is what GitHub uses https://www.npmjs.com/package/pretty-lights but it seems pretty-lights is not the TextMate engine, merely the CSS styler which would make it unrelated to this bug report
from what I last heard The TextMate engine has not been archived and is still taking bug reports #6668 (comment)
Correct, it hasn't been archived but AFAIK it is not actively being worked on in favour of keeping the lights on whilst switching to tree-sitter grammars.
That said, @patrickt might also be a good one to answer the original question about the correct behaviour.
I'm not sure if this is what GitHub uses https://www.npmjs.com/package/pretty-lights but it seems pretty-lights is not the TextMate engine, merely the CSS styler which would make it unrelated to this bug report
That's not it; the repo is private. We don't use that npm pkg either but if you're curious about where the stylesheets come from, they come from https://github.com/primer/github-syntax-light
👋 We’re not currently using tree-sitter-dart in our syntax highlighting pipeline—it looks fairly unmaintained, so I’d be against relying on it. However, fixes on the TextMate grammar side should percolate into our highlighting pipeline as a whole. I forget exactly how that works; let me dig in.
Ah. Sorry, I have no idea. Maybe @Alhadis knows.
@lildude This matter concerns a rather oblique feature of TextMate grammars (begin
/while
pairs) that I admit I have little familiarity with. However, PrettyLights's behaviour appears to be correct here: the comment by @RedCMD seems to indicate an implementation difference in VSCode's handling of begin
/while
pairs, a discrepancy that—given the sketchy and under-documented nature of this particular feature—comes as no surprise.
For this reason, I strongly advise against using while
when possible. Instead, I'd use an end
pattern with a negative lookahead (end: "^(?!\\s*///)"
), and I'd rewrite the embedded rules to short-circuit on the same pattern:
seems to indicate an implementation difference in VSCode's handling of
begin
/while
pairs, a discrepancy that—given the sketchy and under-documented nature of this particular feature—comes as no surprise.For this reason, I strongly advise against using
while
when possible
While this may work, I feel like it would be better for everyone if we could try to come to some concensus about what the behaviour should be. There are many grammers and many clients, and if the "spec" is vague, it would be better to at least have some agreement between the most popular implementations than to have to avoid the feature.
If VS Code + GitHub disagree, maybe there are other common implementations we could review to see if there's a most common interpretation that the others would agree to?
While this may work, I feel like it would be better for everyone if we could try to come to some concensus about what the behaviour should be. There are many grammers and many clients, and if the "spec" is vague, it would be better to at least have some agreement between the most popular implementations than to have to avoid the feature.
I plan on penning an authoritative spec for the TextMate grammar format, based on past and present implementations of its highlighting system. However, I'm still waiting to hear back from @lildude regarding access to the PrettyLights repository; I'll shoot him an e-mail with a follow-up as he's probably forgotten. 😁
If VS Code + GitHub disagree
I believe it's just VSCode that deviates from other implementations regarding begin
/while
blocks. I'll need to check, as the truncation of unclosed begin
/end
blocks within begin
/while
rules strikes me as unusual.
The 1.X TextMate manual does not mention while
https://macromates.com/manual/en/
The 2.0 TextMate manual mentions while
once. But only states that it is related to begin
language-grammars
TextMate's public repo https://github.com/textmate/textmate TextMate.md states
\G
matches end of last stackedwhile
Which VSCode does do, and I assume PrettyLights does as well
VSCode does however clear the \\G
anchor upon the next rule match. idk what PrettyLights and TextMate does
Looking at TextMates example test grammar
It seems while
also places a \\G
anchor at the beginning of the next line.
Which VSCode definitely does not currently do, and I don't think PrettyLights does either.
The example also seems to describe that the begin
/end
rule places its scopes overtop the while
capture.
Which PrettyLights does do, but VSCode does not.
private.h header file grammar.cc loads grammar file etc parse.cc parses document
https://github.com/textmate/textmate/blob/master/Frameworks/parse/src/parse.cc#L408-L411 seems to indicate that a failed while
rule will return the stack
back to its own parent
, terminating any inside begin
/end
rules. tho I'm not sure
https://github.com/arquivolta/desktop/blob/main/lib/platform/win32/arch_to_rootfs.dart#L23-L157 https://github.com/dart-lang/markdown/blob/cc57a08156356d7fc8c06c73de972c58ddb39a87/lib/src/block_parser.dart#L422-L1106
@Alhadis
I plan on penning an authoritative spec for the TextMate grammar format, based on past and present implementations of its highlighting system.
This sounds like a great idea - having something to use as a reference across implementations may avoid these kinds of issues.
I believe it's just VSCode that deviates from other implementations regarding
begin
/while
blocks. I'll need to check, as the truncation of unclosedbegin
/end
blocks withinbegin
/while
rules strikes me as unusual.
If this is the case (and you know of other implementations behaving like PrettyLights), it would be worth posting in https://github.com/microsoft/vscode/issues/189940 (I feel you're more qualified than I) as their response was that this is not a VS Code issue, and I think it's better for all if all implementers can agree on something.
That said, I don't know if @RedCMD's post changes anything here - I don't understand all the details, but on the surface it seems to make some good points (thank you @RedCMD for the input!).
I see this has been confirmed as a bug in VSCode as PrettyLights is matching TextMate 2's behaviour.
Closing as there's nothing more to do here.
@Alhadis I tried implementing your example from above (https://github.com/github-linguist/linguist/issues/7015#issuecomment-2321028904) but I see a problem and I'm not sure whether it's because the parser I'm using is bad (it's a custom parser to run some automated tests), or whether the sample was just incomplete.
Given code like:
/// ```
/// foo
/// ```
The variable.other.source.dart
scope begins with /// ```
and ends with /// ```
which means all of the line prefixes ///
inside the code block are marked as variables variable.other.source.dart
so the comment markers will be highlighted incorrectly.
I think I could add a nested pattern to set them back as comments, but then the scope will be something like "comment variable comment". This feels clunky so I'm wondering if it's really the best way or if I've overlooked something simpler?
Thanks!
nm, I think I found a way, but removing contentName
from that match and using a nested pattern to only apply it to the code after the comment markers :)
{
"begin": "^\\s*(///)\\s*(```)",
"end": "^\\s*(///)\\s*(```)|^(?!\\s*///)",
"patterns": [
{
"include": "#dartdoc-codeblock"
}
]
},
// ...
"dartdoc-codeblock": {
"begin": "^\\s*(///)\\s*",
"end": "\n",
"contentName": "variable.other.source.dart"
},
I found a difference in behaviour between VS Code / GitHub syntax highlighting that I originally thought was VS Code's issue, but after some comments on https://github.com/microsoft/vscode/issues/189940 I'm now not so sure.
The issue is when a rule uses
begin
/end
(for example triple-backtick code blocks) that is nested inside a rule usingwhile1 (for example comments). It appears that in VS Code, if
endnever matches, it still never expands outside of the
while` rule, however on GitHub it does.Example rules:
This colours comments green, and code blocks light blue. When the triple-backticks are closed, it looks like this in VS Code:
And if they are unclosed, they close at the end of the comment (using
while
):However on GitHub, the unclosed triple backticks run to the end of the document (starting on line 10):
I previously thought the issue was VS Code's, but the comment at https://github.com/microsoft/vscode/issues/189940#issuecomment-2312138491 suggests that
while
doesn't allow abegin
/end
to escape outside of it. On the surface this sounds sensible, so I thought it was worth filing here.(If this isn't the right repo for issues like this affecting the GitHub UI, I apologise, please point me in the right direction! Thanks!)