Open darkfeline opened 2 years ago
how were the zip files made? you can try comparing the hash results using https://pkg.go.dev/golang.org/x/mod/sumdb/dirhash HashZip and HashDir
@seankhliao Thanks for pointing the direction. The zip file (https://proxy.golang.org/go.felesatra.moe/cloudflare/@v/v0.3.0.zip) contains an unexpected folder entry (go.felesatra.moe/cloudflare@v0.3.0
). If I remove this entry, the hashes will match.
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2022-06-19 12:55:24 D.... 0 0 go.felesatra.moe/cloudflare@v0.3.0
2022-06-19 12:55:24 ..... 11358 3949 go.felesatra.moe/cloudflare@v0.3.0/LICENSE
2022-06-19 12:55:24 ..... 567 308 go.felesatra.moe/cloudflare@v0.3.0/README.md
2022-06-19 12:55:24 ..... 3541 1411 go.felesatra.moe/cloudflare@v0.3.0/client.go
2022-06-19 12:55:24 ..... 1309 679 go.felesatra.moe/cloudflare@v0.3.0/client_test.go
2022-06-19 12:55:24 ..... 44 44 go.felesatra.moe/cloudflare@v0.3.0/go.mod
2022-06-19 12:55:24 ..... 3185 1016 go.felesatra.moe/cloudflare@v0.3.0/tools.go
2022-06-19 12:55:24 ..... 843 515 go.felesatra.moe/cloudflare@v0.3.0/tools_test.go
------------------- ----- ------------ ------------ ------------------------
2022-06-19 12:55:24 20847 7922 7 files, 1 folders
not sure if HashZip should skip over directory entries or if they should continue to be invalid (to be strict on input)
cc @bcmills @matloob
not sure if HashZip should skip over directory entries or if they should continue to be invalid (to be strict on input)
It seems to me that the bug here is that HashZip
presents directories to the hash function whereas HashDir
explicitly ignores them.
Unfortunately, there is no way in general to make HashDir
match the original checksum computed by HashZip
for these zipfiles: in order to figure out the correct hash, we would need to know whether the zipfile contained the spurious directory entry or not, and by then it's too late.
Ideally I think we should change golang.org/x/mod/zip.Unzip
and CheckZip
to explicitly reject zip files that contain directories — that failure mode seems much clearer than a checksum mismatch, and now that we know that even non-empty directories lead to checksum mismatches I'm more comfortable making that an error. (We discussed this behavior a bit in a review thread, but didn't realize at the time that it would lead to checksum mismatches.)
(We could, independently, also skip directories explicitly in dirhash.HashZip
, but I'm not sure that that would really do much good — if the zip file has already been downloaded, it would make the file seem like it had been corrupted when it really hasn't been.)
I don't know if there's a formal reference for zip file format, but my experience is that the dir entry is commonly present. The zip file in question I constructed using git archive --format=zip --prefix=blah@v0.1.0/
, so I imagine other folks will run into this once they try building module zip files. For reference, doing 7z a module@v0.1.0.zip module@v0.1.0
also includes the dir entry.
@darkfeline, the zip format is documented in https://go.dev/ref/mod#zip-files, and is not the same as what is produced by git archive
.
That reference does currently state that “Empty directories (entries with paths ending with a slash) may be included in module zip files but are not extracted. The go command does not include empty directories in zip files it creates.” However, in light of the checksum mismatches caused by those entries, it seems clear to me that it needs to be revised.
I'm talking about the zip format generally, e.g., whether it is considered standard practice or correct to include dir entries. Because that is what both git archive
and 7z
do (and what I generally see), I assume that is de facto how proper zip archives should be.
Hypothetically then, Go recommending zip archives not containing such dir entries would mean Go is recommending improper/incorrect zip archives to be used. Hence my open question on a formal reference for zip file format.
I might be preaching to the choir, but my opinion is that Go should probably explicitly tolerate zip archives with such dir entries assuming that de facto everyone expects zip archives to include them, even if Go recommends not including them. This is in response to your above comment:
Ideally I think we should change golang.org/x/mod/zip.Unzip and CheckZip to explicitly reject zip files that contain directories
That's fair. I think we'd still need to adjust the checksumming, though, and that will still invalidate all existing modules with zipfiles containing directory entries. 🤷♂️
This issue is currently labeled as early-in-cycle for Go 1.20. That time is now, so a friendly reminder to look at it again.
I looked into this a bit with @bcmills. A few notes.
This problem seems like a fairly low priority, but that would be the plan.
@darkfeline are you using your own code to generate the modules, or are you using some broadly available tool? If the latter we'd want to notify the author. Thanks.
I am using git -C code-dir archive --format=zip --prefix=package@v0.1.0/ v0.1.0
(code)
This issue is currently labeled as early-in-cycle for Go 1.21. That time is now, so a friendly reminder to look at it again.
This issue is currently labeled as early-in-cycle for Go 1.22. That time is now, so a friendly reminder to look at it again.
I think this needs to wait for at least Go 1.23, so that Go toolchains that don't enforce strict toolchain bounds are no longer supported when the hashes change.
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
I think so
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
go mod verify
fails immediately on freshly downloaded modules. This can be reproduced with a hermetic Docker setup:Dockerfile:
go.mod:
main.go:
I suspect there's something weird about the module zip files I build for
go.felesatra.moe/cloudflare
, but I'm not sure where to start looking.Also, it seems strange how
go mod verify
could fail. If I understand correctly, it's checking that the downloaded zip and the unpacked dir haven't been modified, and given the above hermetic Docker reproduction I don't see how that could be the case.What did you expect to see?
No error
What did you see instead?
go mod verify
says the downloaded dir has been modified