bazelbuild / rules_go

Go rules for Bazel
Apache License 2.0
1.36k stars 645 forks source link

nogo: configuration should be applied before executing an analyzer #2472

Open bartle-stripe opened 4 years ago

bartle-stripe commented 4 years ago

What version of rules_go are you using?

0.2.5

What version of gazelle are you using?

c08cffd46756c6dec217fecd5e65a529012dd0f8 (~master)

What version of Bazel are you using?

3.0

Does this issue reproduce with the latest releases of all the above?

Yes.

What operating system and processor architecture are you using?

Mac OS X.

Any other potentially useful information about your toolchain?

What did you do?

I have a custom analyzer that raises an error if a file isn't formatted per gofmt. This requires reading the original go file, from pass.Fset.Position(file.Pos()).Filename. This sometimes fails because it seem like the input file is not available; my guess is that this is similar to #2396 . Unfortunately, I can't simply add an entry to exclude_files because that's evaluated after running the analyzer, which will just fail. I could change my analyzer to not return an error, and just emit a Diagnostic, but that seems a bit odd.

Further, it seems like it'd be more efficient to simply skip analyzers that we plan to ignore the results for.

jayconrod commented 4 years ago

I could change my analyzer to not return an error, and just emit a Diagnostic, but that seems a bit odd.

Why not emit diagnostics though? Diagnostics are the main product of analyzers. Analyzers generally shouldn't report errors unless something goes seriously wrong internally. If the purpose of the analyzer is to report deviations from gofmt, diagnostics seem like the best way to do that.

I don't think there's a practical way to pre-filter files before running analyzers. Even if you want to exclude diagnostics reported in some files, it's probably still necessary to parse and type-check those files so the rest of the package can be analyzed.

bartle-stripe commented 4 years ago

Inability to read an input file seems like something seriously wrong, but I agree that it's easy enough to address by switching to reporting a diagnostic.

Why couldn't you just check exclude_files immediately before calling analyzer.Run? You'd need to deal with analyzer dependencies, but I think you could address that by returning an error if some dependency was excluded (which would require the user to also exclude the relevant files from the analyzer). My understanding was that there is an initial step, before analyzers run, to parse the file out into the AST tree that's passed to analyzers? I'd expect that would still continue to always run, regardless of anything you exclude?

For context, we basically don't want to run ~any analyzers against vendored 3rd-party code, and it's just a waste of resources to run them, only to discard them. We've also found analyzers take a non-trivial amount of RAM on some generated files, requiring us to increase resources on the sandbox they're run in.

jayconrod commented 4 years ago

Inability to read an input file seems like something seriously wrong

Ah, yes, that's a case where it makes sense to return an error. That's why cgocall is disabled now. I meant that if a file can be read but isn't formatted, that's a better case for a diagnostic.

2396 is open because nogo only processes the files sent to the compiler, but cgocall needs the original cgo files. Please file another issue if you think there might be a different root cause.

My understanding was that there is an initial step, before analyzers run, to parse the file out into the AST tree that's passed to analyzers? I'd expect that would still continue to always run, regardless of anything you exclude?

This is correct. The framework parses and loads type information from files passed to the compiler before giving that to the analyzer. All files need to be there in order for that to work.

Why couldn't you just check exclude_files immediately before calling analyzer.Run? You'd need to deal with analyzer dependencies, but I think you could address that by returning an error if some dependency was excluded (which would require the user to also exclude the relevant files from the analyzer).

This is the part I'm not really clear on. What would we do with the exclude_files list? In theory, we could clear the ASTs, but that wouldn't help with analyzers that operate on types or results from other analyzers like SSA.

In any case, this doesn't seem feasible. nogo needs to be pretty closely compatible with go vet so that we can use the same analyzers, unmodified. Filtering the inputs could break that.

For context, we basically don't want to run ~any analyzers against vendored 3rd-party code, and it's just a waste of resources to run them, only to discard them.

That seems more like an argument for a more powerful configuration language and the ability to change which analyzers run on sets of packages. That would be very useful to have.

We've also found analyzers take a non-trivial amount of RAM on some generated files, requiring us to increase resources on the sandbox they're run in.

I wonder if there's a bug here causing nogo to take more RAM than it should? How much RAM are we talking about, and how big are the files (if you don't mind sharing)?

nogo is meant to be scalable. There's an architecturally similar framework within Google (also called nogo) that runs on a very large code base. I don't know if the Bazel version of nogo has been tested on such large code bases, but I'd be interested in hearing about scaling and performance issues.

bartle-stripe commented 4 years ago

I recently saw an OOM on our sandbox container (I believe we schedule with at least 1 GiB RAM) from trying to run the gofmt vetter against some generated code in the aws-sdk. This vetter is pretty basic:

func run(pass *analysis.Pass) (interface{}, error) {
    for _, file := range pass.Files {
        filename := pass.Fset.Position(file.Pos()).Filename
        if ignoreFile(filename) {
            continue
        }

        in, err := ioutil.ReadFile(filename)
        if err != nil {
            return nil, err
        }

        out, err := format.Source(in)
        if err != nil {
            return nil, err
        }

        if bytes.Equal(in, out) {
            continue
        }

        pass.Reportf(file.Pos(), "File is incorrectly formatted; please run `gofmt`")
        err = printDiff(filename, out)
        if err != nil {
            return nil, err
        }
    }
    return nil, nil
}

I don't recall exactly which files I was hitting the "not found" issue with, but I saw a couple non-go files (parser.y iirc) which made it seem like these were being used in conjunction with go:generate or similar.

I haven't had a chance, but I've been meaning to figure out how to write some benchmarks against the vetter stack. It's possible there's some memory leaks/inefficiencies such that merely adding my new vetter to the analyzer list was enough to trip something. It's also possible our sandbox is doing something wrong, but OOM's are definitely correlated with size of the input source file.

bartle-stripe commented 4 years ago

In terms of disabling nogo selectively, does it seem feasible to add an attribute to go_library and friends that either disabled nogo entirely, or allowed you to select a different nogo rule?

jayconrod commented 4 years ago

In the past, we've talked about top-down / bottom-up configuration, in addition to the global configuration we have now.

Top-down would be something specified on a go_binary or go_library and would apply to that target and its dependencies. Unclear how it would work when a package is imported transitively on different paths.

Bottom-up would be something specified on a go_library and would apply to targets that import that library.

These are pretty vague ideas at the moment. Lot of design and implementation TBD.

aaomidi commented 2 days ago

I think this is breaking using a configuration file for errcheck.

Passing:

{
  "_base": {
    "description": "Base config that all subsequent analyzers, even unspecified will inherit."
  },
  "errcheck": {
    "analyzer_flags": {
      "blank": "true",
      "assert": "true",
      "excludeonly": "false"
    },
    "exclude_files": {
      ".*.pb.go": "No generated pb.go files.",
      "_test.go": "No test files."
    }
  }
}

I have my own wrapper around errcheck:

package errcheck

import (
    "fmt"

    "github.com/kisielk/errcheck/errcheck"
    "golang.org/x/tools/go/analysis"
)

var Analyzer *analysis.Analyzer

func init() {
    Analyzer = errcheck.Analyzer
}

func init() {
    fmt.Printf("%+v\n", Analyzer.Flags.Lookup("assert"))
}

&{Name:assert Usage:if true, check for ignored type assertion results Value:false DefValue:false}