Open podtserkovskiy opened 1 month ago
Related Issues and Documentation
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
CC @griesemer @findleyr
I've created a test project podtserkovskiy/issue-69689
with go and cgo testcases for popular linters to make sure linters work for the same before and after my change.
I've reverted CL 127658 on Go release-branch.go1.23
and ran popular linters golangci_lint
(with all lints enabled including unparam
, unconvert
and unused
) and staticcheck
on podtserkovskiy/issue-69689
.
I compared results against the linters compiled with non-modified Go. I downloaded the linters from Homebrew repository.
golangci_lint
before my changesissue-69689 git:(main) golangci-lint run --enable-all
WARN The linter 'execinquery' is deprecated (since v1.58.0) due to: The repository of the linter has been archived by the owner.
WARN The linter 'exportloopref' is deprecated (since v1.60.2) due to: Since Go1.22 (loopvar) this linter is no longer relevant. Replaced by copyloopvar.
WARN The linter 'gomnd' is deprecated (since v1.58.0) due to: The linter has been renamed. Replaced by mnd.
rand/rand.go:11:18: unused-parameter: parameter 'j' seems to be unused, consider removing or renaming it as _ (revive)
func Seed(i int, j int) { // unparam
^
rand/rand.go:12:9: unnecessary conversion (unconvert)
i = int(i) // unconvert
^
rand/rand.go:9:6: func `unusedFunc` is unused (unused)
func unusedFunc() {} // unused func
^
rand/rand.go:14:13: SA1004: sleeping for 1 nanoseconds is probably a bug; be explicit if it isn't (staticcheck)
time.Sleep(1)
^
rand_cgo/rand_cgo.go:15:18: unused-parameter: parameter 'j' seems to be unused, consider removing or renaming it as _ (revive)
func Seed(i int, j int) { // unparam
^
rand_cgo/rand_cgo.go:16:9: unnecessary conversion (unconvert)
i = int(i) // unconvert
^
rand_cgo/rand_cgo.go:18:13: SA1004: sleeping for 1 nanoseconds is probably a bug; be explicit if it isn't (staticcheck)
time.Sleep(1) // SA1004
^
golangci_lint
after my changes➜ issue-69689 git:(main) GOROOT=$PWD/../go PATH=$PWD/../go/bin:$PATH ./../golangci-lint/golangci-lint run --enable-all
WARN The linter 'exportloopref' is deprecated (since v1.60.2) due to: Since Go1.22 (loopvar) this linter is no longer relevant. Replaced by copyloopvar.
WARN The linter 'gomnd' is deprecated (since v1.58.0) due to: The linter has been renamed. Replaced by mnd.
WARN The linter 'execinquery' is deprecated (since v1.58.0) due to: The repository of the linter has been archived by the owner.
rand/rand.go:11:18: unused-parameter: parameter 'j' seems to be unused, consider removing or renaming it as _ (revive)
func Seed(i int, j int) { // unparam
^
rand/rand.go:12:9: unnecessary conversion (unconvert)
i = int(i) // unconvert
^
rand/rand.go:9:6: func `unusedFunc` is unused (unused)
func unusedFunc() {} // unused func
^
rand/rand.go:14:13: SA1004: sleeping for 1 nanoseconds is probably a bug; be explicit if it isn't (staticcheck)
time.Sleep(1)
^
rand_cgo/rand_cgo.go:15:18: unused-parameter: parameter 'j' seems to be unused, consider removing or renaming it as _ (revive)
func Seed(i int, j int) { // unparam
^
rand_cgo/rand_cgo.go:16:9: unnecessary conversion (unconvert)
i = int(i) // unconvert
^
rand_cgo/rand_cgo.go:18:13: SA1004: sleeping for 1 nanoseconds is probably a bug; be explicit if it isn't (staticcheck)
time.Sleep(1) // SA1004
^
staticcheck
before my changes➜ issue-69689 git:(main) staticcheck ./...
rand/rand.go:9:6: func unusedFunc is unused (U1000)
rand/rand.go:14:13: sleeping for 1 nanoseconds is probably a bug; be explicit if it isn't (SA1004)
rand_cgo/rand_cgo.go:18:13: sleeping for 1 nanoseconds is probably a bug; be explicit if it isn't (SA1004)
staticcheck
after my changes➜ issue-69689 git:(main) GOROOT=$PWD/../go PATH=$PWD/../go/bin:$PATH /tmp/staticcheck ./...
rand/rand.go:9:6: func unusedFunc is unused (U1000)
rand/rand.go:14:13: sleeping for 1 nanoseconds is probably a bug; be explicit if it isn't (SA1004)
rand_cgo/rand_cgo.go:18:13: sleeping for 1 nanoseconds is probably a bug; be explicit if it isn't (SA1004)
As we can see from these examples #26671 doesn't reproduce on the linters anymore if we rollback CL 127658.
Therefore, it should be reasonable to change go/scanner.Scanner.updateLineInfo
behaviour by removing CL 127658
However, there's a chance we may break some unknown to me existing programmes. I think it's reasonable to add a GODEBUG option to enable existing programs temporary opt-out from the go/scanner
behaviour change.
Since the blast-radius shouldn't be large I think we can remove this GODEBUG
option in go1.26 as likely nobody would need it by this time.
I'll make and publish a CL shortly.
Change https://go.dev/cl/618276 mentions this issue: go/scanner: Preserve relative filenames from //line comments.
@griesemer @findleyr @ianlancetaylor Could you please take a look on the CL 618276?
Could you please take a look on this proposal, if would be very grateful if we can include this fix in go1.24 release. I'm sorry for pinging you, but there's no much time left before the freeze. @griesemer @findleyr @ianlancetaylor @prattmic @mknyszek @timothy-king @adonovan @mdempsky
Sorry you haven't gotten much traction on this issue. I suspect it's because there's no easy solution, and this has been a problem for a long time. It's likely that we won't reach a satisfactory resolution for 1.24.
I wasn't familiar with the context of this issue, but it strikes me that go/scanner should really not be in the business of transforming relative paths to absolute paths. I agree it would be more correct to change that behavior, though we may not be able to do so.
Can you clarify exactly how you are encountering the absolute paths? In which artifacts are they written? You say "we can't bake absolute paths inside build artifacts", but as far as I can tell from the issue, the paths are only modified inside the token.Files in memory of a process using go/scanner. I suspect that you're talking about compiler output, but it's worth being precise here. It may be that we can alter the behavior of go/scanner for the compiler, and leave it unmodified inside go/parser.ParseFile, which is how most tooling would consume its output.
I agree that go/scanner should not be transforming paths, and that a godebug seems like a reasonable way to make this sound but potentially risky change (though each godebug parameter at least doubles the configuration space for testing).
I suspect that you're talking about compiler output, but it's worth being precise here. It may be that we can alter the behavior of go/scanner for the compiler, and leave it unmodified inside go/parser.ParseFile, which is how most tooling would consume its output.
I'm confused; the compiler doesn't use go/scanner.
I'm confused; the compiler doesn't use go/scanner.
Indeed you're right. I thought that there was a use case somewhere.
The question remains: which artifacts hold the absolute paths?
which artifacts hold the absolute paths?
The source file contains //line relative/file.go:1
and the existing hack causes the token.File
to contain "/cwd/relative/file/go"
, implicitly incorporating the process's working directory. Of course you only see the modified file names if you use FileSet.Position(pos), whereas gopls tries its hardest to use only FileSet.PositionFor(pos, false)
, bypassing the //line
corrections.
@adonovan I understand that this is a problem for token.File (above, I said "as far as I can tell from the issue, the paths are only modified inside the token.Files in memory of a process using go/scanner"). What I don't understand is the exact mechanism that causes a problem in Bazel. The original report says In "build systems like Buck2 or Bazel the build actions may be executed on different hosts, so we can't bake absolute paths inside build artifacts". What are the artifacts? The cgo files themselves contain relative paths.
The original report says In "build systems like Buck2 or Bazel the build actions may be executed on different hosts, so we can't bake absolute paths inside build artifacts". What are the artifacts? The cgo files themselves contain relative paths.
Oh, I took that to mean that there are some go/token-based tools that are used during a Bazel build that read source files (including those generated by cgo) and include the file/line/column information in their output, causing them to inadvertently depend on the process's working directory, breaking determinism. (Technically, hermetism: such programs are deterministic on any given machine, but incorporate information about the build worker machine in their output.) This was a pretty common problem with Blaze and Forge until we tracked down and fixed every nondeterministic and non-hermetic program (by running them twice and failing if the results were not consistent).
Oh, I took that to mean that there are some go/token-based tools that are used during a Bazel build that read source files
Sure, but I wonder what those tools are, and whether they can be updated to write relative paths. Perhaps they can't, if they have some deep integration with go/parser (and go/parser doesn't expose the way it calls go/scanner), but it's worth understanding.
It is indeed inconvenient and confusing that go/scanner translates paths in this way, but the bug is arguably in the tool that generates the artifacts with absolute paths, or the system that consumes those absolute paths in an inconsistent context. From go/scanner's perspective, inside a running process, the transformation is valid.
Before advocating for a solution, I want to better understand the problem.
Sure, but I wonder what those tools are, and whether they can be updated to write relative paths. Perhaps they can't...
I'm sure they can, and I think this proposal is suggesting that they should so that we can remove this workaround from the scanner. The GODEBUG is an acknowledgement that there is risk and cost.
It is indeed inconvenient and confusing that go/scanner translates paths in this way, but the bug is arguably in the tool that generates the artifacts with absolute paths, or the system that consumes those absolute paths in an inconsistent context.
I think we are all in agreement on this.
From go/scanner's perspective, inside a running process, the transformation is valid.
I don't think that's true: given that the parser API takes as parameters both the content of the file and its apparent file name, the scanner has no justification for interacting with the file system.
I'm sure they can, and I think this proposal is suggesting that they should so that we can remove this workaround from the scanner.
Sorry, I don't understand this comment. I believe the proposal is suggesting that tools relying on absolute paths should update so that the path translation in go/scanner can be removed. Specifically, consider this quote: "The cgocall analyser in golang.org/x/tools is trying to access the original source file by fset.Position(raw.Pos()).Filename, but fails when relative filepaths used because the filename is invalid." That is saying the cgocall analyzer fails when relative paths are used in the token.File. It is not saying that the cgocall analyser is failing in the distributed bazel build. I'm asking what is failing in the distributed bazel build.
I don't think that's true: given that the parser API takes as parameters both the content of the file and its apparent file name, the scanner has no justification for interacting with the file system.
How is the scanner interacting with the file system? IIUC it is simply translating paths from relative to absolute, based on the directory of the filepath being parsed. There is no I/O involved.
Sorry, I now realize that all this time, thanks to this comment:
// Put a relative filename in the current directory.
I was under the misapprehension that the scanner was using the process's current directory to absolutize relative file names in line directives, when in fact it is lexically joining the parent directory of the file containing the line directive (or the purported filename provided in the call to ParseFile).
The compiler documentation doesn't give any guidance on the interpretation of relative file names in line directives, and one can imagine a number of possibilities. Given that the scanner behavior should not depend on os.Getwd, and that the working directory of the tool that wrote the //line directive is unknowable, the only interpretation that makes sense to me is the one that is implemented (though possibly without the filepath.Clean, which can give the wrong answer in the presence of symbolic links). Unless the line directives are lexically relativized to the source file's nominal directory, the apparent file name reported by token.File.Name would be unrelativized, making it implicitly relative to the process's working directory.
(I had been nodding my head in agreement with all your comments all this time not realizing I was saying something opposite to what I meant.)
(Apologies if this is obvious to others. I'm catching up.)
@findleyr
Can you clarify exactly how you are encountering the absolute paths? In which artifacts are they written? You say "we can't bake absolute paths inside build artifacts", but as far as I can tell from the issue, the paths are only modified inside the token.Files in memory of a process using go/scanner. I suspect that you're talking about compiler output, but it's worth being precise here. It may be that we can alter the behavior of go/scanner for the compiler, and leave it unmodified inside go/parser.ParseFile, which is how most tooling would consume its output.
The cmd/cgo
makes paths absolute, but this behaviour can be altered by-timpath=$(pwd)
and Bazel does that.
The compiler works fine with both relative and absolute paths, that's not a problem at all. The compiler doesn't use go/scanner
The problem is tools like cgocall
from golang.org/x/tools as they depend on go/scanner
and the following code fset.Position(raw.Pos()).Filename doesn't work correct due to the behaviour of go/scanner
I'd like to fix.
@adonovan
The compiler documentation doesn't give any guidance on the interpretation of relative file names in line directives, and one can imagine a number of possibilities. Given that the scanner behavior should not depend on os.Getwd, and that the working directory of the tool that wrote the //line directive is unknowable, the only interpretation that makes sense to me is the one that is implemented (though possibly without the filepath.Clean, which can give the wrong answer in the presence of symbolic links). Unless the line directives are lexically relativized to the source file's nominal directory, the apparent file name reported by token.File.Name would be unrelativized, making it implicitly relative to the process's working directory.
Yes go/scanner
doesn't use os.Getwd
, it just assumes that when we parse file abc/src1.go
the relative path provided by //line
comment is relative to the directory abc
. As I understand from this code the compiler doesn't do this assumption.
@timothy-king
If a GODEBUG is introduced 1.24, the earliest it can be removed is 1.28. The minimum number of versions before deprecation is 4
Thanks, I didn't know about that.
I suspect we want to introduce the GODEBUG as defaulting to its old behavior for one release.
Yes, this makes sense to me
I understand this is quite a tricky issue, than you all for digging into it
Aha, this is interesting, @podtserkovskiy.
The problem is tools like cgocall from golang.org/x/tools as they depend on go/scanner and the following code fset.Position(raw.Pos()).Filename doesn't work correct due to the behaviour of go/scanner I'd like to fix.
Can you say more about this? If the line directives of the cgo files are relative, why is there an error when they are translated to absolute paths? IIUC, this translation happens inside of a call to go/parser.ParseFile
, running on the local machine. Unless I'm missing something, the only way that the resulting absolute paths would be inaccurate is if the relative path to the referenced file has changed, or if the path passed to ParseFile
is inaccurate (such as could be possible if the raw source were provided to ParseFile, bypassing an actual filesystem read).
Is it possible that the issue here is that Bazel uses a build sandbox, so when analyzing a file containing a //line directive, the directive refers to a file that is no longer accessible because it is not a direct input to the current build step? If so, then I think the real problem is cgocall's questionable assumption that it can open the file referred to by //line.
I would suggest using FileSet.PositionFor(pos, false) to ignore the //line directive, but the cgocall logic added in https://go.dev/cl/147317 is quite explicit about the need to parse the "raw" cgo file:
This change causes cgocall to parse, modify, type-check and analyze "raw" cgo files. The approach (based on dot-importing the "cooked" package) is rather too clever but should be more robust than the one it replaces.
Rather too clever indeed. Perhaps the solution is for cgocall to fail gracefully if it can't open the raw cgo file.
@findleyr
Can you say more about this? If the line directives of the cgo files are relative, why is there an error when they are translated to absolute paths?
The issue is that the relative path form //line
comment is being modified here and the result of this modification is another relative (incorrect) filename that doesn't exists on the disk.
I've just published a repo with a repro on https://github.com/podtserkovskiy/issue-69689
The Readme.md
contains both: the bug repro on go1.23 and the demonstration that https://go.dev/cl/618276 fixes the issue
Perhaps the solution is for cgocall to fail gracefully if it can't open the raw cgo file.
@podtserkovskiy, could you try applying this patch to see if it is a satisfactory solution?
diff --git a/go/analysis/passes/cgocall/cgocall.go b/go/analysis/passes/cgocall/cgocall.go
index 613583a1a..3b01cb84f 100644
--- a/go/analysis/passes/cgocall/cgocall.go
+++ b/go/analysis/passes/cgocall/cgocall.go
@@ -7,12 +7,14 @@
package cgocall
import (
+ "errors"
"fmt"
"go/ast"
"go/format"
"go/parser"
"go/token"
"go/types"
+ "io/fs"
"log"
"os"
"strconv"
@@ -47,6 +49,9 @@ func run(pass *analysis.Pass) (interface{}, error) {
cgofiles, info, err := typeCheckCgoSourceFiles(pass.Fset, pass.Pkg, pass.Files, pass.TypesInfo, pass.TypesSizes)
if err != nil {
+ if err == errNoSource {
+ return nil, nil
+ }
return nil, err
}
for _, f := range cgofiles {
@@ -55,6 +60,8 @@ func run(pass *analysis.Pass) (interface{}, error) {
return nil, nil
}
+var errNoSource = errors.New("cgo source unavailable")
+
func checkCgo(fset *token.FileSet, f *ast.File, info *types.Info, reportf func(token.Pos, string, ...interface{})) {
ast.Inspect(f, func(n ast.Node) bool {
call, ok := n.(*ast.CallExpr)
@@ -179,9 +186,18 @@ func typeCheckCgoSourceFiles(fset *token.FileSet, pkg *types.Package, files []*a
for _, raw := range files {
// If f is a cgo-generated file, Position reports
// the original file, honoring //line directives.
+ //
+ // This operation is of dubious validity because it accesses
+ // files that may not be accessible when the analysis driver
+ // runs in a build sandbox as used by Bazel.
+ // So, if we can't read the source file, silently give up.
filename := fset.Position(raw.Pos()).Filename // sic: Pos, not FileStart
- f, err := parser.ParseFile(fset, filename, nil, parser.SkipObjectResolution)
+ f, err := parser.ParseFile(fset, filename+"bonkers", nil, parser.SkipObjectResolution)
if err != nil {
+ if errors.Is(err, new(fs.PathError)) {
+ log.Printf("cgocall analyzer on package %s cannot read raw cgo source file %s (possible build sandbox?); skipping", pkg, filename)
+ return nil, nil, errNoSource
+ }
return nil, nil, fmt.Errorf("can't parse raw cgo file: %v", err)
}
found := false
could you try applying this patch to see if it is a satisfactory solution?
@adonovan This helps to silence the errors on this specific analyser, but this is not what I'd like to achieve. I want to enable linters to work on CGo code on Go projects using Buck, specifically on Facebook codebase.
The problem is a bit more broad then cgocall
analyser. Currently code issues like fmt.Sprintf("%d", "awd")
, redundant returns or unhandled errors are ignored by popular linters on our codebase.
The root cause is caused by the assumption of go/scanner
that prefixes any data from //line
comment with the directory name passed into parser.ParseFile
.
For example:
cgo_objdir/test1.cgo1.go
with the comment //line src/test1.go:1:1
parser.ParseFile(fset, "cgo_objdir/test1.cgo1.go", nil, parser.ParseComments)
infos[0].lineInfo.Filename = "cgo_objdir/src/test1.go"
The cgo_objdir/src/test1.go
isn't valid, we should have gotten just src/test1.go
instead.
Then parser.ParseFile
returns parser.ParseFile
the error open /workdir/cgo_objdir/src/test1.go: no such file or directory
The full example is here https://github.com/podtserkovskiy/issue-69689
The CL 618276 is definitely fixes those problems for us, so linters on CGo files work the same way as they work on normal Go files.
During my experiments I didn't find any particular example when CL 618276 break something. So it seems like nothing depends on these lines.
Do you have any specific examples when this CL 618276 can break existing code?
I understand, the decision how to resolve relative paths is of course debatable, but the natural resolving relative to CWD seems to be consistent AFAICT across other go standard tools and less magical then the current one.
What do you think about that?
During https://github.com/golang/go/issues/69689#issuecomment-2394812954 I didn't find any particular example when CL 618276 break something. So it seems like nothing depends on these lines.
Do folks have thoughts on how to scale up this experiment to avoid #26671 repeating?
@timothy-king it's still not clear to me that this is the correct fix.
I understand, the decision how to resolve relative paths is of course debatable, but the natural resolving relative to CWD seems to be consistent AFAICT across other go standard tools and less magical then the current one.
I don't think this is clear. It seems to me that if the line directive says "dir/foo.go", I would expect that to be relative to the current file, because the file itself is not associated with any CWD.
I think the line directives are just wrong. Why not just run cgo
with -trimpath
set to something like "$(pwd)=>.."
, to get the correct path adjustments?
Proposal Details
The current behaviour
The
go/scanner.Scanner.updateLineInfo
prepends name of*.cgo1.go
file directory to the relative path of the original CGo file taken from//line
comment.For example:
Let's parse it then with
parser.ParseFile(fset, "cgo_objdir/test1.cgo1.go", nil, parser.ParseComments)
and checkfset
value.The issue is that
fset
will containgo/token.File
withinfos[0].lineInfo.Filename = "cgo_objdir/src/test1.go"
.The path
"cgo_objdir/src/test1.go"
inlineInfo.Filename
is invalid.There was an attempt to fix this behaviour, but it was rollbacked in CL 127658 due to Issue 26671
When this becomes a problem?
In build systems like Buck2 or Bazel the build actions may be executed on different hosts, so we can't bake absolute paths inside build artifacts.
We have a way to prevent this by adding
-trimpath $(pwd)
to build commands. However thego/scanner
will compute invalid filepath in this case as shown in the above.A specific example
The
cgocall
analyser ingolang.org/x/tools
is trying to access the original source file byfset.Position(raw.Pos()).Filename
, but fails when relative filepaths used because the filename is invalid.This doesn't allow seamless integration of Buck2/Bazel with code analysis tools.
Possible solutions
Option 1: Bold
Rollback CL 127658, but this probably can break some existing linters or other programs like it was in Issue 26671.
Option 2: Safe
Rollback CL 127658 and add a
GODEBUG
constant to minimise potential damage to existing programs.goscannerprefixlinedir = "1"
- old behaviourgoscannerprefixlinedir != "1"
- new behaviourI'm not sure what the default behaviour of should be for go1.24. I'm happy with any option as we can set
GODEBUG
env-var before running code analysers.EDIT: updated
GODEBUG
variable name