[nogo] How to specify and include specialized linter config files?

bazelbuild / rules_go

Go rules for Bazel

Apache License 2.0

1.36k stars 645 forks source link

[nogo] How to specify and include specialized linter config files? #3234

Open rmohr opened 2 years ago

rmohr commented 2 years ago

What version of rules_go are you using?

What version of gazelle are you using?

What version of Bazel are you using?

Does this issue reproduce with the latest releases of all the above?

Yes

What operating system and processor architecture are you using?

Any other potentially useful information about your toolchain?

What did you do?

For linters it is common to have configuration files for detailed configurations. For instance errcheck has a flag to pass in an exclude file: https://github.com/kisielk/errcheck/blob/master/errcheck/analyzer.go#L22.

I have seen that some projects are modifying analyzers so that they can embed config files (e.g. here: https://github.com/cockroachdb/cockroach/blob/e1aeffe98004d203134b6c69e8f9afc29f03b999/pkg/testutils/lint/passes/errcheck/errcheck.go).

There seems to be no easy way to include such config files with nogo.

What did you expect to see?

I was expecting something like a label map on the nogo target which would allow me to pass in files for each linter, so that they can be referenced in the nogo.json config file via the flag support.

What did you see instead?

There seems to be no easy way to include custom configuration files.

adonovan commented 2 years ago

As documented here, the nogo BUILD rule allows you to declare a config=... configuration in a JSON file. Is this not sufficient? If not, why not? Is the problem that you want separate JSON files for each analyzer? If so, perhaps the config could be produced by composition from several files by another build rule.

rmohr commented 2 years ago

@adonovan thanks for checking.

As documented here, the nogo BUILD rule allows you to declare a config=... configuration in a JSON file.

Yes, that is currently the only config which can be supplied and it is solely for the nogo binary itself.

Is this not sufficient? If not, why not? Is the problem that you want separate JSON files for each analyzer?

Yes that is the basic problem, other analyzers have their own config files. For instance errcheck: https://github.com/kisielk/errcheck/blob/master/errcheck/analyzer.go#L29, which is for instance not even a json file.

Another example would be banncheck: https://github.com/google/go-safeweb/blob/master/cmd/bancheck/bannedapi/banned_api.go#L35.

If so, perhaps the config could be produced by composition from several files by another build rule.

I am not sure how I could then reference these from within analyzers without changing them to somehow access some runfiles.

If there is nothing like this right now, maybe it would make sense to introduce something like the following:

Considering the errcheck example, which allows passing in a config via --exclude we could introduce a linter_configs list of labels:

nogo(
    name = "nogo_vet",
    config = "nogo_config.json",
    visibility = ["//visibility:public"],
    deps = [
        "//vendor/.../errcheck:go_default_library",
        "@org_golang_x_tools//go/analysis/passes/asmdecl:go_default_library",
    ],
    linter_configs = [
        "//linters:errcheck.conf",
        "//other:target",
    ],
)

And they could then bereferenced in the nogo config json with $(location):

{
  "errcheck": {
    "exclude_files": {
      "vendor/": "vendor doesn't pass vet",
      "external/": "externaldoesn't pass vet"
    },
    "analyzer_flags": {
        "exclude": "$(location //linters:errcheck.conf)",
    },
  }
}

similar to genrules. @sluongng you seem to be pretty active around nogo, is that something which sounds interesting for you too?

sluongng commented 2 years ago

I think the link Alan's shared was the correct way of what we would support today: each Analyzer should expose a set of flags that could be configured at runtime(package initialization), then we config those flag via nogo JSON config.

Over in https://github.com/sluongng/nogo-analyzer/blob/d6a3958307f2b485cda307f453e06db025415177/def.bzl#L3 I introduce a way to programatically generate this JSON config from starlark with a sensible default. I think this should serve MOST use cases today.

For runfiles dependency, you should think about the dependency graph like this:

runfiles (config files) --> Analyzer (go_library)
Analyzer --> Nogo (go_binary)
nogo_config.json --> Nogo (go_binary)

So trying to declare this in Bazel as

runfiles --> nogo_config.json --> Nogo

would not make much sense to me.

The correct way to go about this is to change how you are building/compiling the Analyzer. For example: wrap the Analyzer to use the config file as an embedded file would be the easiest way to do this.

Worth to note that nogo strictly follow how analysis.Analyzer work, this means that it's natural for some Analyzer you found in the wild to be not compatible with nogo by default.

For this reason, I have created https://github.com/sluongng/nogo-analyzer to store all the wrappers of analyzers in the wild so that they could be consumed by nogo with minimal setup. I would love to get a contribution from you to enable the Analyzers you mentioned above for nogo use case 🤗

rmohr commented 2 years ago

I have to confess I can't fully follow. Happy if you could help me out a bit :)

I think the link Alan's shared was the correct way of what we would support today: each Analyzer should expose a set of flags that could be configured at runtime(package initialization), then we config those flag via nogo JSON config.

Yes, I don't think that would change if extra config files could be extra referenced. It would just allow adding extra configs as dependency and find out where to read them.

Over in https://github.com/sluongng/nogo-analyzer/blob/d6a3958307f2b485cda307f453e06db025415177/def.bzl#L3 I introduce a way to programatically generate this JSON config from starlark with a sensible default. I think this should serve MOST use cases today.

For runfiles dependency, you should think about the dependency graph like this:
runfiles (config files) --> Analyzer (go_library)
Analyzer --> Nogo (go_binary)
nogo_config.json --> Nogo (go_binary)

So is the argument, that the nogo config will not be evaluated before the analyzers are created? I am not sure how this could be the case when I am able to specify flags already for analyzers.

So trying to declare this in Bazel as
runfiles --> nogo_config.json --> Nogo
would not make much sense to me.

I would see it more like this:

input: config files via labels, nogo_config.json, analyzers to run then: figure out the final file location and replace refrences in nogo_config.json then: run the analyzers with all the additional dependencies and the resolved flags available in the sandbox

I am sure I don't understand the context properly.

The correct way to go about this is to change how you are building/compiling the Analyzer. For example: wrap the Analyzer to use the config file as an embedded file would be the easiest way to do this.

Yes, that is what I see on some projects, but it means that I have to fork and modify analyzers just to pass in additional configs. I know that there are more complex analyzers which don't fit into nogo for various reasons, but there is already a set where just being able to pass in a config is the main blocker.

rmohr commented 2 years ago

For this reason, I have created https://github.com/sluongng/nogo-analyzer to store all the wrappers of analyzers in the wild so that they could be consumed by nogo with minimal setup. I would love to get a contribution from you to enable the Analyzers you mentioned above for nogo use case

Yes I have seein your staticcheck nogo work. It is really great! I definitely think we will use it in kubevirt to run staticcheck. I however can for instace right now pull the latest errcheck analyzer with a simple make deps-update in our project, and don't have to maintain extra packages to get latest updates. Or, I could if I could pass in configs :D

rmohr commented 2 years ago

So is the argument, that the nogo config will not be evaluated before the analyzers are created? I am not sure how this could be the case when I am able to specify flags already for analyzers.

Oh I guess the point is, that runfiles are not part of the build process and that we don't know the location yet then? Is nogo hard-coding during build time the exact flags? If so, just moving parts of it to the nogo binary flags may make it possible to make them late-binding?

sluongng commented 2 years ago

So trying to declare this in Bazel as
runfiles --> nogo_config.json --> Nogo
would not make much sense to me.
I would see it more like this:

input: config files via labels, nogo_config.json, analyzers to run then: figure out the final file location and replace refrences in nogo_config.json then: run the analyzers with all the additional dependencies and the resolved flags available in the sandbox

There are 2 pieces to the puzzle here:

To include Analyzer's config file(s) as part of Nogo's runfiles. This happen at Build time of nogo
To make Analyzer support Bazel's runfile convention so they know where to read the file from at runtime

For (1), I think it's doable like you suggested. https://github.com/bazelbuild/rules_go/blob/8fe3a08f99b3e544024a146f792572606a04138b/go/private/rules/nogo.bzl#L35 is where the magic at. Action GoGenNogo which invoke builder gennogomain -config <nogo_config.json> is incharge of generating nogo_main.go source code. Then the rule compile + link it into an executable binary. So I guess you could modify GoGenNogo to include this

linter_configs = [
    "//linters:errcheck.conf",
    "//other:target",
],

as part of the nogo's runfiles.

For (2), in gennogomain you would need to translate the Label of these configs into actual runfile path, preferably using https://github.com/bazelbuild/rules_go/blob/master/go/tools/bazel/runfiles.go. So that the Analyzers are passed the exact runfile path when nogo is run during GoCompilePkg.

So yes, I think the request is doable(?).

However I have concerns:

I would think that writing my own Analyzer wrapper is a cheaper/faster approach 😆. So realistically, I would rather write a wrapper.
Currently nogo is run as part of GoCompilePkg action. This action is quite bloated and do too many things at once while given a limited CPU core as Bazel runs many actions in parallel. I don't think the current setup is suitable to scale up to O(n) number of Analyzer, certainly not if we started to add 1 new config file for each configured Analyzer. This point is orthogonal to the current issue and could/should be addressed separately, but do keep in mind that "adding more files into every GoCompilePkg" comes with a significant cost for bigger repos (i.e. RBE worker would have to download these extra blobs over the network).

rmohr commented 2 years ago

I would think that writing my own Analyzer wrapper is a cheaper/faster approach laughing. So realistically, I would rather write a wrapper.

Hm, I think it is mostly forking and not just wrapping. Like first I embed it, and then I have to change the part which does os.Open, right?

Currently nogo is run as part of GoCompilePkg action. This action is quite bloated and do too many things at once while given a limited CPU core as Bazel runs many actions in parallel. I don't think the current setup is suitable to scale up to O(n) number of Analyzer, certainly not if we started to add 1 new config file for each configured Analyzer. This point is orthogonal to the current issue and could/should be addressed separately, but do keep in mind that "adding more files into every GoCompilePkg" comes with a significant cost for bigger repos (i.e. RBE worker would have to download these extra blobs over the network).

Would it be an option to literally embed such files into the nogo binay which contains the analzyers and write them to well known locations and reference from there? So handling it all inside the nogo binary?

adonovan commented 2 years ago

I don't think the current setup is suitable to scale up to O(n) number of Analyzer, certainly not if we started to add 1 new config file for each configured Analyzer.

If the nogo rule instance depends on all of the config file targets, it could (internally) combine them into a single JSON value and pass just this one file to each nogo action.

Would it be an option to literally embed such files into the nogo binary which contains the analyzers and write them to well known locations and reference from there? So handling it all inside the nogo binary?

It does seem that there is perhaps a feature missing from the Analyzer API: a richer way to express configuration than flags. Flags are limited to scalar values and often need to refer to files containing denylists, etc, the details of which vary by build system. Perhaps we should think about a way to extend the Analyzer interface to support structured configuration supplied by the driver, which would (abstractly) hold a map from Analyzer name to a valid JSON-encoded string.

For example, adding a ConfigJSON reflect.Type field in the Analyzer struct would allow (i) each analyzer to express the type of its configuration, (ii) the driver to load the configuration from an unspecified source (e.g. separate files in runfiles, entries in an embed.FS, sections from a single large JSON file, etc), and (iii) the driver to parse and validate the configuration JSON once before calling any analyzer Pass. An analyzer could define custom logic in MyConfig.UnmarshalJSON methods to perform analyzer-specific parsing and validation at a finer grain. The resulting MyConfig value would be supplied to each pass in the Pass.ConfigJSON field. (The Pass would not be allowed to modify it, and this may need to be enforced somehow to avoid side channels between passes.)

Obviously this is just a sketch, and we'd need a real proposal, but this is something I've had in the back of my mind even as we designed the current API.

rmohr commented 2 years ago

Obviously this is just a sketch, and we'd need a real proposal, but this is something I've had in the back of my mind even as we designed the current API.

Yes that sounds very reasonable. Thanks for the details. I will try to come up with a proposal draft with a few use-cases.

mpatou commented 3 months ago

I would like to spin this question one more time, I'm starting to have a closer look at nogo-analyzer and it would be good if errcheck could be configured to ignore missing check on some functions as the original linter allows to do.

It seems that we have now most of the things in place (was it already the case in 2022?):

Analyzer has a Flagsattribute which should contains all the flags that a given linter accept
We can set those flags in the JSON config it seems with "analyzer_flags": {"flag": "value"}
nogo-analyzer is already exposing an Analyzer object

The missing pieces would be to have the linter to be exposed through nogo-analyzer it seems that both errcheck and prealloc are good examples on how to do it depending on how the underlying linter is working.

aaomidi commented 8 hours ago

This is still I think a problem, are there any workarounds to be able to bring errcheck into bazel without basically forking/patching errcheck.