Closed adam-azarchs closed 2 months ago
Thanks for this Pull Request! It looks already really good!
I quickly checked this on a file with a couple of edge cases. It seems the following cases are not handled, yet:
--no@...
custom flagsThis does not attempt to handle full bourne-shell quoting semantics. It will only handle quotes around the entire flag, e.g. "--foo=some value" or around the flag value, e.g. --foo="some value". In practice, I doubt there are many cases where anyone would want to do something other than that.
I agree, that seems to be a sensible assumption
Thanks for this Pull Request! It looks already really good!
Thanks!
I quickly checked this on a file with a couple of edge cases. It seems the following cases are not handled, yet:
- multiline commands, i.e. a backslash at the end of the line
- escaped whitespaces, i.e. a backslash in front of a whitespace
Yes, I should fix these. Textmate grammars make this a bit tricky to do, however, as they don't support multiline regexes. I won't promise to fix all of these edge cases. Handling of these semantics are difficult in general because there isn't support for a "multi-pass" approach to tokenization. Also, a shell will first generate a sequence of tokens, turning e.g.
blah "blah"'"'"blah""blah\"" blah
into three tokens blah
, blah"blahblah"
, and blah
, and only then try to figure out what to do with each token. TextMate grammars are too limited to do that kind of multi-stage processing, so we basically face a choice between giving up on deeper understanding of what each token means or giving up on trying to handle every quoting/escaping situation. I'd rather do the latter.
--no@...
custom flags
That's easy enough to fix.
This does not attempt to handle full bourne-shell quoting semantics. It will only handle quotes around the entire flag, e.g. "--foo=some value" or around the flag value, e.g. --foo="some value". In practice, I doubt there are many cases where anyone would want to do something other than that.
I agree, that seems to be a sensible assumption
Basically, since a .bazelrc
is not a shell script, there are a lot of things it doesn't support (subshells, functions, redirections, parameter expansion, command substitution...), so most of the reasons why someone would want to use most of those "unusual" quoting patterns go away. While technically valid, they're unnecessarily obfuscating and if someone is doing that, having the syntax highlighting flag it up as questionable is probably for the best.
With the above in mind, going through your listed edge cases:
# A \
multiline \
comment
Fixed. I'll note this one isn't actually handled by the shell syntax highlighting that's used by default currently, either. Because I don't think it's actually valid in a shell. But, it seems it is valid in a .bazelrc
, so I can make the syntax highlighting handle it.
build -k
fixed
build: --keep_going
fixed
buil"d:my"config --keep_going
Falls under "unusual" quoting patterns that I said I wasn't going to support.
build:my\ config --keep_going
fixed
build --no@//my/package:bool_flag
fixed
build --flag \
"very long value that you want to put on a separate line to stay within some length limit"
Also, a minor refinement but I've made startup:some_config
show as invalid, because according to documentation
Setting
startup:config-name --some_startup_option
in the .bazelrc will be ignored.
Technically "ignored" isn't the same thing as invalid, but users would probably prefer to have the visual hint that what they're doing won't work the way they probably expect it to.
--objc_debug_with_GLIBCXX
exists as the lone documented non-user-defined flag that isn't only lower-case letters, numbers, and underscores.I guess we don't have a unit testing framework set up yet, but if we did, would it be possible to add tests for this via the method you mention in the PR description?
I guess we don't even need a unit testing framework to use vscode-tmgrammar-test
. It'd be useful to have this running on CI and some snapshots checked into source control.
TextMate grammars are too limited to do that kind of multi-stage processing, so we basically face a choice between giving up on deeper understanding of what each token means or giving up on trying to handle every quoting/escaping situation. I'd rather do the latter.
Note I have a not-yet-opensourced language server for bazelrc files laying around. In the long run, if we integrate that language server and let it take care of adding the deeper semantic understanding, I guess the grammar should go back to focusing on the tokenization. But that's something for a future discussion and for the current PR I agree with your tradeoff
One last thing I noticed: We now also accept line continuations with whitespaces before the line break. Bazel only continues the line on backslashes immediately before the newline.
Except for that, the functionality looks good to me now, and I would be happy to merge this. Agree with @cameron-martin, though, that adding test cases would be a nice addition! (Please let us know if we are asking too much. I am already very happy with the contribution as is 🙂)
I guess we don't even need a unit testing framework to use
vscode-tmgrammar-test
. It'd be useful to have this running on CI and some snapshots checked into source control.
I'd be happy to add that. My main hesitation was that the project was just the first google hit for "vscode syntax grammar unit testing" and doesn't appear to be super-popular. As an xoogler I'm very well aware of how involved the process of adding any kind of third-party dependencies can be. But it's probably still safe enough in a github action with appropriately limited permissions if that would be acceptable; I have basically zero experience with buildkite.
Note I have a not-yet-opensourced language server for bazelrc files laying around. In the long run, if we integrate that language server and let it take care of adding the deeper semantic understanding, I guess the grammar should go back to focusing on the tokenization. But that's something for a future discussion and for the current PR I agree with your tradeoff
A language server seemed a bit bit overkill to me, but then again it could add e.g. links/hovertext for documentation of known flags and whatnot, which would actually be really helpful because the web docs are a bit difficult to scan through.
One last thing I noticed: We now also accept line continuations with whitespaces before the line break. Bazel only continues the line on backslashes immediately before the newline.
Ah, whopse, will fix that, maybe this evening. I'll also take another stab at trying to get mult-line continuation working between a flag and its value; having slept on it a bit, I think it's maybe not so bad.
Except for that, the functionality looks good to me now, and I would be happy to merge this. Agree with @cameron-martin, though, that adding test cases would be a nice addition! (Please let us know if we are asking too much. I am already very happy with the contribution as is 🙂)
Thanks! Let me know if a github action is acceptable for such a CI check, or if you'd prefer I could just contribute the test case file and let someone else figure out how to wire it up.
One other question, I've been developing this grammar in yaml form, because it's a lot more convenient to work with in my opinion. Would you like that in the PR as well, or just the converted-to-json form?
Let me know if a github action is acceptable for such a CI check, or if you'd prefer I could just contribute the test case file and let someone else figure out how to wire it up.
For the time being, I would prefer if you could integrate those tests such that they can be run as npm run test
and then add that npm run test
invocation also to the Bazel CI config. (Long-term, based on the discussion in https://github.com/orgs/bazelbuild/discussions/2, it's a bit unclear to me where this project will move regarding CI in the mid-term)
I've been developing this grammar in yaml form, because it's a lot more convenient to work with in my opinion. Would you like that in the PR as well, or just the converted-to-json form?
What is the "convert-to-json" step? Is it reversible? If it's a reversible step, I would prefer if we only keep the JSON checked in (since that's the established standard), but also add a subsection to https://github.com/bazelbuild/vscode-bazel/blob/master/CONTRIBUTING.md#setting-up-your-development-environment to document how to switch forth and back from/to YAML
I got a kind of partial fix for line continuation followed by flag value. I think I've had enough of trying to reverse-engineer the undocumented bits of textmate grammars to figure out how to do better than that, so I think this is as good as it's going to get.
For the time being, I would prefer if you could integrate those tests such that they can be run as
npm run test
and then add thatnpm run test
invocation also to the Bazel CI config. (Long-term, based on the discussion in https://github.com/orgs/bazelbuild/discussions/2, it's a bit unclear to me where this project will move regarding CI in the mid-term)
That requires putting the tool in the dev dependencies in the package.json
, but if you're fine with that then I'm fine with it.
I've been developing this grammar in yaml form, because it's a lot more convenient to work with in my opinion. Would you like that in the PR as well, or just the converted-to-json form?
What is the "convert-to-json" step? Is it reversible? If it's a reversible step, I would prefer if we only keep the JSON checked in (since that's the established standard), but also add a subsection to https://github.com/bazelbuild/vscode-bazel/blob/master/CONTRIBUTING.md#setting-up-your-development-environment to document how to switch forth and back from/to YAML
See https://code.visualstudio.com/api/language-extensions/syntax-highlight-guide#developing-a-new-grammar-extension, "Using YAML to write a grammar":
Yaml grammars have the exact same structure as a json based grammar but allow you to use yaml's more concise syntax, along with features such as multi-line strings and comments. VS Code can only load json grammars, so yaml based grammars must be converted to json. The js-yaml package and command-line tool makes this easy.
Converting yaml to json is a well-defined operation. However, because yaml has e.g. several ways to write a string, as well as supporting comments (which json does not), you lose something in the conversion. The main advantage IMO is not needing to escape \
or "
in strings (some ways to write strings, anyway), which keeps regexes easier to read. It's probably not worth the trouble of setting up the necessary CI to enforce that they stay in sync, though.
My 2cts below. Would also be interested in @cameron-martin and / or @jfirebaugh's opinion, though 🙂
That requires putting the tool in the dev dependencies in the package.json, but if you're fine with that then I'm fine with it.
I think it's a reasonable 3pp dependency which we don't want to rewrite. As such, I am fine with adding this as a dev dependency.
See https://code.visualstudio.com/api/language-extensions/syntax-highlight-guide#developing-a-new-grammar-extension, "Using YAML to write a grammar":
Yes, I think it would make sense adding the YAML version, too. Midterm, I think we should then add js-yaml
as part of our build system, and delete the JSON file again. But that can be future work, not necessarily part of this review...
I'm fine with adding this extra dependency, and also agree that if YAML is preferred for authoring then that should be in source control, with JSON created during the build (but also fine with not doing that in this PR).
Ok, I've checked in the yaml instead of the json, and updated scripts/build.sh
to do the conversion. I think you'll probably agree that the yaml form is more legible by humans.
I've also added an npm run test
command for checking the grammar against the snapshot which I added, and update-snapshot
to update that snapshot.
Thank again for your contribution 🚀
A language server seemed a bit bit overkill to me, but then again it could add e.g. links/hovertext for documentation of known flags and whatnot, which would actually be really helpful because the web docs are a bit difficult to scan through.
I finally found the time to open-source https://github.com/salesforce-misc/bazelrc-lsp
The main purpose here is really to prevent vscode from autodetecting .bazelrc as a shell file, resulting in complaints about syntax that isn't valid in a bash script, or mistakenly highlighting e.g. variable expansions as if they were actually honored in a .bazelrc.
We can also do a bit better, by actually understanding the semantics of e.g. config specifiers and so on.
This does not attempt to handle full bourne-shell quoting semantics. It will only handle quotes around the entire flag, e.g.
"--foo=some value"
or around the flag value, e.g.--foo="some value"
. In practice, I doubt there are many cases where anyone would want to do something other than that.This does not try to exhaustively match the list of all possible flags, and so it can make mistakes like treating --copt -O3 as if it were two separate flags rather than a flag + value, however that's mostly harmless. Maintaining a list of known flags would be prohibitively difficult to maintain without some kind of automation.
There is special treatment for a couple of flags, including
--config
and a few flags that expect regex arguments, like--instrumentation_filter
.Closes #259
Choices of scope names were somewhat arbitrary, since the standard scope names don't really match up with the semantics of a .bazelrc file. I mostly tweaked them to work well on this test example.bazelrc and the vscode default dark theme:
Unit testing for syntax grammars is not particularly easy; I was using https://github.com/PanAeon/vscode-tmgrammar-test during development, as a snapshot test with the same
example.bazelrc
.