Build tags making our life difficult

dominikh commented 3 years ago

Build tags cause us great pain. Some checks only operate on literals, not constants, in fear of introducing false positives but thus introducing false negatives. Other checks fail to take all effects of build tags into consideration, causing false positives.

This issue acts as a list of checks with known shortcomings caused by build tags. The goal is to address all of these issues, by teaching Staticcheck which values, types and behaviors depend on build tags, and which can be assumed to be static. Our current plan is to make use of the fact that if the build tags governing the use of a value imply the build tags of its definition, then only the one value can ever be observed. For example, the value of runtime.GOOS depends on the active build tags. If used in a file tagged with windows, the value will always be "windows". However, in a file tagged with linux, the value may be either "linux" or "android".

This approach doesn't let us know of all possible values, only of the fact that values may differ under different build configurations. As such, it itself will cause false negatives, especially in type-based analyses. Consider S1039, which flags calls of the form fmt.Sprint(x) where x's type is string. If x is defined under multiple build configurations, we can't be sure that its type is always string, even when it might be. It is not clear to me whether we should handle this. The vast majority of code isn't this stupid and we cause barely any false positives (no known ones). However, handling this correctly would cause a lot of false negatives.

Checks that only support basic literals, not consts

Note that a lot of these checks shouldn't flag code that uses constants, even if the constant's value is certain. x * 0 is useless, but x * N where N == 0 might be code that intentionally got disabled. The same applies to time.Sleep(N). However, we could flag these exclusively in gopls, so that editors render the code as dead code.

Checks that assume a constant's value is static

Checks that assume a function's behavior is static

SA4006 (#787)
SA4017

Checks that assume a type's definition is static

Checks that assume that an expression's type is static, even though it might differ under different build tags, in particular due to `:=` assignment

Issues that are complicated by build tags

936
902
48
902
787

dominikh commented 2 years ago

The other approach to build tags is to run Staticcheck multiple times, once per interesting build configuration. We can then merge the results. Each diagnostic would specify if it should be reported if 1) all build configurations of which a file is part of produced the diagnostic 2) any build configurations produced the diagnostic.

For example, calling regexp.Compile with an invalid regexp should always be reported, even if it's only invalid under one build configuration. However, dead code should only be reported if it is dead under all build configurations.

This approach leads to more accurate results, but also increases the total runtime by a factor of N, where N is the number of build configurations.

dominikh commented 2 years ago

If go/analysis allowed attaching metadata to diagnostics, then we could combine the two approaches, by selecting the merge strategy dynamically, requiring "reported under all configurations" only when a type or value is known to differ under build tags. This would increase precision and reduce the number of false negatives.

dominikh commented 2 years ago

We can also improve precision by tracking reachable code on a per-line basis instead of a per-file basis, although the definition of "reachable" differs for AST-based and IR-based analyses.

dominikh commented 2 years ago

Based on feedback from mvdan, we'll provide a shortcut for running and merging multiple runs on the same host:

$ staticcheck -matrix <<EOF
linux: GOOS=linux
windows: GOOS=windows
appengine: -tags=appengine
EOF
sand_appengine.go:8:28: error parsing regexp: missing argument to repetition operator: `?` [appengine] (SA1000)
sand_linux.go:5:28: error parsing regexp: missing argument to repetition operator: `*` [appengine,linux] (SA1000)
sand_windows.go:5:28: error parsing regexp: missing argument to repetition operator: `+` [windows] (SA1000)

This can be used by itself, or in combination with the -merge mode introduced in 5a97b5d57a1d97794946d82cde58b349439e4605.

cespare commented 2 years ago

You can add S1008 to that first list of checks. It suggested a simplification based on a constant that's set to true/false using different build tags.

dominikh commented 2 years ago

@cespare Do you mean that the constant is used as the value in the return statement?

Edit: fixed that one in 246e50be7597f3b3a5412d1634f8879fc338887c

cespare commented 2 years ago

I do! Thanks for the fix :)

dominikh commented 2 years ago

Now that we have the -merge and -matrix flags, the next question is whether we should still aim to detect build-tag-dependent behavior, to reduce the number of false positives when only checking a single configuration. However, if we do that, we'd also need a way to disable it. Otherwise, if we use -merge but the individual runs each suppress their potential false positives, we'll not have any results to merge.

dominikh commented 2 years ago

we'd also need a way to disable it.

Maybe we could have an -intent flag that spans both this, and the planned code review/aggressive checks, and is used for selecting different intentions of running Staticcheck: code review, single CI run, merged CI run.

dominikh / go-tools