golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
121.39k stars 17.39k forks source link

cmd/go: checksum mismatch for module containing Git LFS files #41708

Open costela opened 3 years ago

costela commented 3 years ago

What version of Go are you using (go version)?

$ go version
go version go1.15.2 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/costela/.cache/go-build"
GOENV="/home/costela/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/costela/go/pkg/mod"
GOOS="linux"
GOPATH="/home/costela/go"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/costela/sdk/go1.15.2"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/costela/sdk/go1.15.2/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build342748982=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I think I might have found data corruption on proxy.golang.org, and it's unfortunately unclear how it relates to the module's content.

First, the "correct" case: the zip file stored in proxy.golang.org matches the checksum on sum.golang.org, as expected.

$ go1.15.2 clean -modcache && go1.15.2 mod download -json github.com/costela/bio-rd@v0.0.99-1598984019
{
    "Path": "github.com/costela/bio-rd",
    "Version": "v0.0.99-1598984019",
    "Info": "/home/costela/go/pkg/mod/cache/download/github.com/costela/bio-rd/@v/v0.0.99-1598984019.info",
    "GoMod": "/home/costela/go/pkg/mod/cache/download/github.com/costela/bio-rd/@v/v0.0.99-1598984019.mod",
    "Zip": "/home/costela/go/pkg/mod/cache/download/github.com/costela/bio-rd/@v/v0.0.99-1598984019.zip",
    "Dir": "/home/costela/go/pkg/mod/github.com/costela/bio-rd@v0.0.99-1598984019",
    "Sum": "h1:/wCyhMn+DmkHkW3mGjGSso+QfpGsZMojhthWjFfptvw=",
    "GoModSum": "h1:Dm2pV+USySIWrQ13pjU0+KxXwiKPGdiigDv2fM+RcDs="
}

NOTE: this repo was forked and tagged explicitly to debug this issue. The tag is new and hasn't been "moved".

Now we hit the problem with GOPROXY=direct:

$ go1.15.2 clean -modcache && GOPROXY=direct go1.15.2 mod download -json github.com/costela/bio-rd@v0.0.99-1598984019
{
    "Path": "github.com/costela/bio-rd",
    "Version": "v0.0.99-1598984019",
    "Error": "github.com/costela/bio-rd@v0.0.99-1598984019: verifying module: checksum mismatch\n\tdownloaded: h1:kHzRJEumdz5H8oG53A0iBfTKELjaNSYvv+r9KIJk7p4=\n\tsum.golang.org: h1:/wCyhMn+DmkHkW3mGjGSso+QfpGsZMojhthWjFfptvw=\n\nSECURITY ERROR\nThis download does NOT match the one reported by the checksum server.\nThe bits may have been replaced on the origin server, or an attacker may\nhave intercepted the download attempt.\n\nFor more information, see 'go help module-auth'.\n",
    "Info": "/home/costela/go/pkg/mod/cache/download/github.com/costela/bio-rd/@v/v0.0.99-1598984019.info",
    "GoMod": "/home/costela/go/pkg/mod/cache/download/github.com/costela/bio-rd/@v/v0.0.99-1598984019.mod",
    "GoModSum": "h1:Dm2pV+USySIWrQ13pjU0+KxXwiKPGdiigDv2fM+RcDs="
}

If we compare the zip file received from proxy.golang.org to the one generated locally as a tempfile when using GOPROXY=direct, we notice a discrepancy: $ diff -urN <(zipinfo goproxy.zip) <(zipinfo direct.zip)

--- /proc/self/fd/11    2020-09-30 10:44:29.400724671 +0200
+++ /proc/self/fd/13    2020-09-30 10:44:29.396724658 +0200
@@ -1,5 +1,5 @@
-Archive:  goproxy.zip
-Zip file size: 3020773 bytes, number of entries: 403
+Archive:  direct.zip
+Zip file size: 7157287 bytes, number of entries: 403
 -rw----     2.0 fat      107 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/.circleci/build-examples
 -rw----     2.0 fat      226 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/.circleci/check-gofmt
 -rw----     2.0 fat      405 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/.circleci/config.yml
@@ -12,7 +12,7 @@
 -rw----     2.0 fat    11357 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/LICENSE
 -rw----     2.0 fat      860 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/README.md
 -rw----     2.0 fat      473 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/RFCs.md
--rw----     2.0 fat      133 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/benchmarks/bgp/decode_real_full_feed/AS8881.raw
+-rw----     2.0 fat 12282584 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/benchmarks/bgp/decode_real_full_feed/AS8881.raw
 -rw----     2.0 fat     2444 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/benchmarks/bgp/decode_real_full_feed/main.go
 -rw----     2.0 fat     4421 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/benchmarks/bgp/learning/main.go
 -rw----     2.0 fat      967 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/benchmarks/ipcache/main.go
@@ -403,4 +403,4 @@
 -rw----     2.0 fat      836 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/util/net/net_addr_test.go
 -rw----     2.0 fat     4095 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/util/servicewrapper/grpc.go
 -rw----     2.0 fat     1091 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/util/time/ticker.go
-403 files, 6622187 bytes uncompressed, 2916799 bytes compressed:  56.0%
+403 files, 18904638 bytes uncompressed, 7053313 bytes compressed:  62.7%

It looks like proxy.golang.org silently truncated one binary file in the repository, so the checksum in sum.golang.org is for this "truncated" version, which doesn't correspond to the actual code in the original repository.

What did you expect to see?

The checksum seen when downloading with or without GOPROXY=direct should be the same.

What did you see instead?

Checksum mismatch.

seankhliao commented 3 years ago

I see the file is stored with Git LFS? I don't think the proxy has that setup

bcmills commented 3 years ago

For Git LFS, see also #38941 and #39720.

costela commented 3 years ago

@seankhliao oh man, that's embarassing. Totally overlooked LFS. Then this is just a duplicate of #38941.

Then the question becomes: shouldn't the locally generated zip also ignore all filters, to ensure the content seen by the proxy is the same as seen by go when not using the proxy?

The git archive call already takes some steps to normalize the output, but maybe the existence of git filters was overlooked? It feels a bit weird to have the sort of "environment changes" (whether git-lfs is installed locally or not) have such a disruptive (and opaque) effect.

bcmills commented 3 years ago

(Since #38941 timed out, let's continue the discussion here.)

bcmills commented 3 years ago

Picking up from https://github.com/golang/go/issues/38941#issuecomment-626754044:

The go command uses git archive under the hood.

According to https://github.com/git-lfs/git-lfs/issues/1322#issuecomment-520559436,

If you have Git LFS enabled (i.e., the filter rules are properly set up via git lfs install), a recent version of git archive will include the LFS files in it, even in a bare repository.

So it's not obvious to me why GIT_LFS_SKIP_SMUDGE=1 would be the right resolution here: why would git-lfs users expect Go modules to include (or omit) LFS files, and under what conditions? Does this behavior vary with git and git-lfs versions, and is upgrading to a more recent git and/or git-lfs binary a viable workaround?

(I haven't used Git LFS, so the answers to those questions are not obvious to me.)

costela commented 3 years ago

I'm unfortunately not familiar with git internals, but I suspect the issue is not exclusive to LFS. Any comparable "git extension" requiring external binaries and opportunistic integration via .gitattributes ("opportunistic" in the sense that git will still work without them) will presumably fail in a similarly opaque way when running GOPROXY=direct go get in a system that happens to have them installed. (git-crypt is another example; though probably less relevant).

It seems unrealistic to expect proxy.golang.org to support all of these extensions. OTOH, there doesn't seem to be a clean way to disable all clean/smudge filters during local checksum generation (i.e.: git archive). The following patch works for me™, but feels like a huge hack:

diff --git a/src/cmd/go/internal/modfetch/codehost/git.go b/src/cmd/go/internal/modfetch/codehost/git.go
index 31921324a7..099837554a 100644
--- a/src/cmd/go/internal/modfetch/codehost/git.go
+++ b/src/cmd/go/internal/modfetch/codehost/git.go
@@ -820,7 +820,7 @@ func (r *gitRepo) ReadZip(rev, subdir string, maxSize int64) (zip io.ReadCloser,
        // text file line endings. Setting -c core.autocrlf=input means only
        // translate files on the way into the repo, not on the way out (archive).
        // The -c core.eol=lf should be unnecessary but set it anyway.
-       archive, err := Run(r.dir, "git", "-c", "core.autocrlf=input", "-c", "core.eol=lf", "archive", "--format=zip", "--prefix=prefix/", info.Name, args)
+       archive, err := Run(r.dir, "git", "-c", "core.autocrlf=input", "-c", "core.eol=lf", "archive", "--worktree-attributes", "--format=zip", "--prefix=prefix/", info.Name, args)
        if err != nil {
                if bytes.Contains(err.(*RunError).Stderr, []byte("did not match any files")) {
                        return nil, os.ErrNotExist

The documentation for the --worktree-attributes flag mentions only usage in a working-tree and we're running archive in a bare repository, so there isn't any guarantee its behavior won't change.

OneOfOne commented 3 years ago

Got hit by this as well, the patch from @costela worked since it was kinda urgent.