go/types, x/tools/go/types, x/tools/analysis: encoding used by objectpath is inconsistent for use by compiler & tools/analysis

amscanne commented 3 years ago

This issue is derived from dominikh/go-tools#924.

What version of Go are you using (`go version`)?

1.16rc1

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (`go env`)?

Linux, ARM64.

What did you do?

When using analyzers implemented using the analysis package, I hit upon a case where facts being exported and imported were being matched up with the wrong methods. (This issue was a method-specific one, and the proposed fix is also method specific.)

For example, instrumenting the facts package in Encode and Decode for serialization and deserialization of facts:

2021/02/09 12:37:21 export: exported func (*pkg/sentry/vfs/vfs.DynamicBytesFileDescriptionImpl).StateFields() []string as DynamicBytesFileDescriptionImpl.M11

2021/02/09 12:18:54 import: imported DynamicBytesFileDescriptionImpl.M11 as func (*pkg/sentry/vfs/vfs.DynamicBytesFileDescriptionImpl).Read(ctx pkg/context/context.Context, dst pkg/usermem/usermem.IOSequence, opts google3/third_party/gvir/pkg/sentry/vfs/vfs.ReadOptions) (int64, error)

It turns out that the objectpath package is producing the name above, and this is what is used for keying the type in the exported fact data. During analysis, the type information for imported packages is sourced from the compiled artifact via gcexportdata, but the current package types are synthesized directly from the source files, and facts are derived from those types. However, this means that there is the possibility of facts being constructed using a different method ordering that what might appear in the compiler artifact, and therefore fact serialization may not key types correctly (and whoever is importing this fact is in for a bad time).

Since this logic is effectively built in to the compiler (which is often a binary package), this has the potential to cause issues for any analysis packages that may link against a different go/types package or perturb the ordering for NamedType.methods.

Possible Fix

While I realize that this API is not stable, the issue is effectively mitigated by making the Method ordering stable for NamedTypes by performing a sort on the first call to Method (and ensuring that no calls to AddMethod happen after that point). This should eliminate the sensitivity issue (but won't fix breakages in the case of genuine binary incompatibility, which is fine).

A draft of this fix is posted here: https://go-review.googlesource.com/c/go/+/290750

(But of course, I could be way off in my diagnose, or there could be a better solution.)

prattmic commented 3 years ago

cc @griesemer @matloob

prattmic commented 3 years ago

@amscanne do the types constructed from gcexportdata always contain all of the methods that are present when analyzing from source (such as unexported methods)? If not, it seems that simply sorting the methods wouldn't be sufficient. Instead it seems like objectpath might need the actual method name rather than just an index?

amscanne commented 3 years ago

Great question. I have no idea. I'll do a bit more digging. However, sorting does solve the issue in this case, so I suspect the answer might yes here (although that does not mean it is yes everywhere).

prattmic commented 3 years ago

cc @timothy-king @guodongli-google as well, since y'all have been looking at analysis recently and may have thoughts here.

amscanne commented 3 years ago

I think I have a better understanding of this issue.

The object file itself seems to contain rich type information (along with names). Based on my reading of the gcexportdata code, it seems like it is sorted in lexicographical order (presumably by the toolchain when emitting the object). Because of this, gcexportdata will always decode in the same lexicographical order. This is good, because it establishes a consistent ordering.

The problem is fundamentally with objectpath, which relies on this ordering. The objectpath package encodes methods in a way that relies on method ordering, as opposed to using names (which almost everything else does). Therefore, the objectpath encoding is susceptible to the method ordering on types.Func objects. Note that types.Interface objects are explicitly ordered during construction, so this indexing scheme works fine for interfaces -- just not methods.

The facts framework relies on objectpath to generate consistent keys for types. If two types.Package objects are the same with the exception of method ordering on relevant types.Func, then objectpath will generate two different paths for many objects (functions themselves, parameters, etc.).

For analysis, there are two different sources for the types.Package: the current package comes from the source code (parsing and type checking), while the types.Package for dependencies comes from gcexportdata. So the facts are saved using the source method ordering, while they are loaded using the gcexportdata method ordering (lexicographical).

It was suggested in [1] that type checking will sort methods, but I don't believe that is true. I think that ast parsing and type checking is sensitive to the ordering of files being parsed. This must be the case because the gcexportdata import does have a consistent lexicographical ordering imposed, as noted at the top.

As some evidence of this, my current workaround is to sort all input files [2] which seems to permute the types generated sufficiently to match in my case (though this is extremely fragile and works only for now). I've added sanity checking for the binary vs AST-derived types, and this files simply by removing the sort in this case.

So there are two reasonable solutions, 1) Objectpath will always look at all methods and index into the sorted list. 2) Type checking does sort these methods before completing a package.

Since (1) is unlikely to break anything (both uses of objectpath will get the exact same encoding, since gcexportdata will already be sorted and objectpath will index into that list), I think this makes more sense.

[1] https://go-review.googlesource.com/c/go/+/290750 [2] https://github.com/google/gvisor/blob/dc1b3884f30d96053dae550d3c40d035c8893d4b/tools/nogo/nogo.go#L465

amscanne commented 3 years ago

I've sent a change to use a "canonical" ordering for objectpath: https://go-review.googlesource.com/c/tools/+/331789

amscanne commented 3 years ago

As expected, the workaround is too brittle to work correctly. While everything works in one configuration, another configuration (tsan/race) yields type conflicts. I think the proper objectpath fix is the way forward.

mdempsky commented 3 years ago

Based on my reading of the gcexportdata code, it seems like it is sorted in lexicographical order (presumably by the toolchain when emitting the object).

Package-scope declarations are ordered lexicographically, but methods aren't declared at package scope. They're attached to their receiver type.

As expected, the workaround is too brittle to work correctly. While everything works in one configuration, another configuration (tsan/race) yields type conflicts.

How robust is objectpath expected to be about different build configurations? In general, changing build tags can arbitrarily change a type's definition, which I would think would necessarily invalidate objectpath strings.

timothy-king commented 3 years ago

Here is a description of the reproducers for this issue that I am aware of:

A Go project is built using blaze (Google's internal version of bazel).
The target contains a file foo/foo.go and a bazel rule to generate a file based on foo.go
The generated file is appended to the GoFile list for the target, bazel-generated/foo/foo_generated.go.
The target is a sent to a native blaze rule to compile it.
GoFile list is sorted by that rule before sending it to the compiler. "bazel-generated/foo/foo_generated.go" < "foo/foo.go" so these now swap positions.
The compiler writes out the gcexportdata in this order.
A separate source analyzer based on (*types.Config).Check loads the files in the package using the original GoFile order.
Both foo/foo.go and bazel-generated/foo/foo.go contain methods on an exported type T.
The objectpath for the methods on T now disagree on what the i'th method of T is between gcexportdata and the types coming from (*types.Config).Check.

We can make method ids in objectpath agnostic to the GoFile order of these two paths by sorting them.

If "foo_generated.go" and "foo.go" contained init() functions, we would be changing the expected order these are executed in. This can potentially be resolved separately.

findleyr commented 3 years ago

Hi, just catching up. As I commented on https://golang.org/cl/331789, I'm a little concerned about changing objectpath serialization. I think we should do it, but want to go over the compatibility implications.

Here's my analysis:

The objectpath package itself leaves it ambiguous whether its encoding can change.
go/analysis is explicit that its encoding is unspecified. That takes care of most uses.
pkgsite points to a few uses of objectpath outside of go/analysis. Obviously there could be more that are vendored or not open source. Notably, it looks like golangci-lint and honnef.co/go/tools use it, but both cases look OK (they're either operating on a transient cache, or are also relying on fact encoding).

Does anyone else have observations or concerns with respect to backwards compatibility of objectpath encoding? I have very little context on this package, but based on my analysis I think it's a gray area. Since this is fixing a bug, I think we should proceed. We should probably also make it explicit that objectpath encoding may change.

mdempsky commented 3 years ago

The objectpath package itself leaves it ambiguous whether its encoding can change.

FWIW, the current encoding depends on implementation details of the export data format that I at least consider to be unstable. E.g., I've already landed CLs on the dev.typeparams branch that sort methods before exporting them, so that's going to change objectpath's method numbering anyway. (And the x/tools exporter already sorts them too, but using a different sort order.)

So to the extent that objectpath needs to be stable, it needs to be decoupled from the export data ordering of method's anyway.

If it's important to maintain historical sort order, it might be able to sort methods based on Pos. But I think sorting on Id is simpler / more robust.

timothy-king commented 3 years ago

Does anyone else have observations or concerns with respect to backwards compatibility of objectpath encoding?

I had fairly similar concerns and went through roughly the same checklist. Most of my observations were left as comments on https://golang.org/cl/331789. I think we are okay.

The objectpath package itself leaves it ambiguous whether its encoding can change.

At the moment objectpath is determined by the order files are parsed. This means if two tools disagree about the file order the method ids are unstable anyways. We some evidence that this is happening. (See my previous comment for details.) My understanding of the objectpath documentation is that this was not the intention. So plausibly this is a bug in the implementation.

We should probably also make it explicit that objectpath encoding may change.

This would probably need to be similar to other interfaces like gcexportdata. This needs to be consistent while stored in the cache by the same tool while analyzing a different project.

Personally I think we may just want to stop using numbers for identifying the methods and switch to method names in the encoding. Quicker (asymptotically at least) writing and lookup times (after https://golang.org/cl/331789 goes in), removes the file ordering concerns and seems conceptually stable. It may also just be robust enough that clearly documenting the conditions that objectpath is stable w.r.t. is not worth it? The main cost that I can think of is additional memory/storage space.

it might be able to sort methods based on Pos.

I have not looked into the details of how token.FileSet are created, but Pos order might have the same set of problems: a different order of files passed to the tool creates a different order of Files, which creates different Pos orders. Positions might be doable but I am not sure if we can always rely on these.

gopherbot commented 3 years ago

Change https://golang.org/cl/331789 mentions this issue: go/types/objectpath: canonical order for methods

gopherbot commented 3 years ago

Change https://golang.org/cl/339689 mentions this issue: go/types/objectpath: use method names in objectpath.

dominikh commented 3 years ago

What's the status of this? Have we decided on either of the two proposed fixes?

With Go tip it appears almost guaranteed to run into this bug, see the various issues referring to this issue.

timothy-king commented 3 years ago

What's the status of this? Have we decided on either of the two proposed fixes?

The status is that we delayed https://go-review.googlesource.com/c/tools/+/331789/ until post release. Now that the tree is open for Go 1.18 we are no longer blocked. We now need to make a decision between this and the other option https://golang.org/cl/339689.

If folks have feedback on the candidate solutions, it would help to get this soon.

(And since I cannot leave well enough alone https://github.com/golang/go/issues/47725 is related, but not necessary to make a decision on this.)

gopherbot commented 3 years ago

Change https://golang.org/cl/343390 mentions this issue: cmd/compile: only sort methods/interfaces during export for -d=unifiedquirks

gopherbot commented 3 years ago

Change https://golang.org/cl/354433 mentions this issue: go/types/objectpath: canonical order for methods

golang / go