golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.89k stars 17.52k forks source link

cmd/go: mod verify fails if directory entries are in zip file #53448

Open darkfeline opened 2 years ago

darkfeline commented 2 years ago

What version of Go are you using (go version)?

$ go version
go version go1.18.3 linux/amd64

Does this issue reproduce with the latest release?

I think so

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.18.3"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build456399546=/tmp/go-build -gno-record-gcc-switches"

What did you do?

go mod verify fails immediately on freshly downloaded modules. This can be reproduced with a hermetic Docker setup:

Dockerfile:

FROM docker.io/library/golang:alpine as builder
WORKDIR /usr/src/app
COPY go.mod go.sum ./
RUN go mod download && go mod verify
COPY . .
RUN go build -v .

go.mod:

module example
go 1.18
require go.felesatra.moe/cloudflare v0.3.0

main.go:

package main

import (
    "go.felesatra.moe/cloudflare"
)

func main() {
    _ := cloudflare.Client{}
}

I suspect there's something weird about the module zip files I build for go.felesatra.moe/cloudflare, but I'm not sure where to start looking.

Also, it seems strange how go mod verify could fail. If I understand correctly, it's checking that the downloaded zip and the unpacked dir haven't been modified, and given the above hermetic Docker reproduction I don't see how that could be the case.

What did you expect to see?

No error

What did you see instead?

go mod verify says the downloaded dir has been modified

seankhliao commented 2 years ago

how were the zip files made? you can try comparing the hash results using https://pkg.go.dev/golang.org/x/mod/sumdb/dirhash HashZip and HashDir

ZekeLu commented 2 years ago

@seankhliao Thanks for pointing the direction. The zip file (https://proxy.golang.org/go.felesatra.moe/cloudflare/@v/v0.3.0.zip) contains an unexpected folder entry (go.felesatra.moe/cloudflare@v0.3.0). If I remove this entry, the hashes will match.

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2022-06-19 12:55:24 D....            0            0  go.felesatra.moe/cloudflare@v0.3.0
2022-06-19 12:55:24 .....        11358         3949  go.felesatra.moe/cloudflare@v0.3.0/LICENSE
2022-06-19 12:55:24 .....          567          308  go.felesatra.moe/cloudflare@v0.3.0/README.md
2022-06-19 12:55:24 .....         3541         1411  go.felesatra.moe/cloudflare@v0.3.0/client.go
2022-06-19 12:55:24 .....         1309          679  go.felesatra.moe/cloudflare@v0.3.0/client_test.go
2022-06-19 12:55:24 .....           44           44  go.felesatra.moe/cloudflare@v0.3.0/go.mod
2022-06-19 12:55:24 .....         3185         1016  go.felesatra.moe/cloudflare@v0.3.0/tools.go
2022-06-19 12:55:24 .....          843          515  go.felesatra.moe/cloudflare@v0.3.0/tools_test.go
------------------- ----- ------------ ------------  ------------------------
2022-06-19 12:55:24              20847         7922  7 files, 1 folders
seankhliao commented 2 years ago

not sure if HashZip should skip over directory entries or if they should continue to be invalid (to be strict on input)

cc @bcmills @matloob

bcmills commented 2 years ago

not sure if HashZip should skip over directory entries or if they should continue to be invalid (to be strict on input)

It seems to me that the bug here is that HashZip presents directories to the hash function whereas HashDir explicitly ignores them.

Unfortunately, there is no way in general to make HashDir match the original checksum computed by HashZip for these zipfiles: in order to figure out the correct hash, we would need to know whether the zipfile contained the spurious directory entry or not, and by then it's too late.

Ideally I think we should change golang.org/x/mod/zip.Unzip and CheckZip to explicitly reject zip files that contain directories — that failure mode seems much clearer than a checksum mismatch, and now that we know that even non-empty directories lead to checksum mismatches I'm more comfortable making that an error. (We discussed this behavior a bit in a review thread, but didn't realize at the time that it would lead to checksum mismatches.)

bcmills commented 2 years ago

(We could, independently, also skip directories explicitly in dirhash.HashZip, but I'm not sure that that would really do much good — if the zip file has already been downloaded, it would make the file seem like it had been corrupted when it really hasn't been.)

darkfeline commented 2 years ago

I don't know if there's a formal reference for zip file format, but my experience is that the dir entry is commonly present. The zip file in question I constructed using git archive --format=zip --prefix=blah@v0.1.0/, so I imagine other folks will run into this once they try building module zip files. For reference, doing 7z a module@v0.1.0.zip module@v0.1.0 also includes the dir entry.

bcmills commented 2 years ago

@darkfeline, the zip format is documented in https://go.dev/ref/mod#zip-files, and is not the same as what is produced by git archive.

That reference does currently state that “Empty directories (entries with paths ending with a slash) may be included in module zip files but are not extracted. The go command does not include empty directories in zip files it creates.” However, in light of the checksum mismatches caused by those entries, it seems clear to me that it needs to be revised.

darkfeline commented 2 years ago

I'm talking about the zip format generally, e.g., whether it is considered standard practice or correct to include dir entries. Because that is what both git archive and 7z do (and what I generally see), I assume that is de facto how proper zip archives should be.

Hypothetically then, Go recommending zip archives not containing such dir entries would mean Go is recommending improper/incorrect zip archives to be used. Hence my open question on a formal reference for zip file format.

I might be preaching to the choir, but my opinion is that Go should probably explicitly tolerate zip archives with such dir entries assuming that de facto everyone expects zip archives to include them, even if Go recommends not including them. This is in response to your above comment:

Ideally I think we should change golang.org/x/mod/zip.Unzip and CheckZip to explicitly reject zip files that contain directories

bcmills commented 2 years ago

That's fair. I think we'd still need to adjust the checksumming, though, and that will still invalidate all existing modules with zipfiles containing directory entries. 🤷‍♂️

gopherbot commented 2 years ago

This issue is currently labeled as early-in-cycle for Go 1.20. That time is now, so a friendly reminder to look at it again.

rsc commented 1 year ago

I looked into this a bit with @bcmills. A few notes.

  1. As far as scope of the problem, I am scanning the cached modules in the proxy and up to Jan 1 2021 the only affected modules are in the go.felesatra.moe domain. Of course, we don't want to break any modules at all, but it's reassuring that the problem is not more widespread.
  2. The checksum database lists multiple go.sum lines, and all the access code treats the result as a list of lines. The go command has a checksum it is looking for, and if it's listed on one of the lines, all is well. The code was written this way specifically to accommodate hash algorithm updates (from h1: to h2:) but it will work for updating the canonical h1: hash as well.
  3. The overall plan would be:
    • Update the struct listed in 'go help mod download' to add two new fields OldSums []string and OldGoSums []string. The go command would then compute the "fixed" h1 hash as Sum and GoModSum and the original, if it differs, into OldSums and OldGoSums. (The go.mod sums are unaffected due to this bug, so OldGoSums will always be empty for now. OldSums will only be non-empty for affected modules, which right now looks like only go.felesatra.moe modules.)
    • When checking against a checksum database result, the go command changes to accept either the canonical sum or any of the "old" sums.
    • When checking against an existing go.sum line, the go command changes to accept either the canonical sum or any of the "old" sums.
    • When writing a new line to go.sum (because there is no existing line for a given module), the go command writes only the canonical sum.
    • Update the checksum database server to include OldSums and OldGoSums as well as Sum and GoModSum in its checksum entries. This must happen before or at the same time that the checksum database starts using the new go command. Otherwise older go commands will be served entries with only the new checksums (for newly indexed modules) and will fail to validate them.
    • Any external verifiers would need to be updated to check that the database lists a subset of the sums listed in 'go mod download -json'.

This problem seems like a fairly low priority, but that would be the plan.

rsc commented 1 year ago

@darkfeline are you using your own code to generate the modules, or are you using some broadly available tool? If the latter we'd want to notify the author. Thanks.

darkfeline commented 1 year ago

I am using git -C code-dir archive --format=zip --prefix=package@v0.1.0/ v0.1.0 (code)

gopherbot commented 1 year ago

This issue is currently labeled as early-in-cycle for Go 1.21. That time is now, so a friendly reminder to look at it again.

gopherbot commented 1 year ago

This issue is currently labeled as early-in-cycle for Go 1.22. That time is now, so a friendly reminder to look at it again.

bcmills commented 11 months ago

I think this needs to wait for at least Go 1.23, so that Go toolchains that don't enforce strict toolchain bounds are no longer supported when the hashes change.