golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.5k stars 17.6k forks source link

archive/tar: write NUL in USTAR header in overflow situations #24599

Closed StefanScherer closed 6 years ago

StefanScherer commented 6 years ago

What version of Go are you using (go version)?

1.10

Does this issue reproduce with the latest release?

yes, also happens with 1.10.1

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/go"
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build238205497=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Users start to complaint that Packer 1.2.1 (https://github.com/hashicorp/packer/issues/5990) creates Vagrant box files that cannot be extracted on macOS. These Vagrant boxes are technically tar files with large VM files in it. Packer 1.2.1 is compiled with Golang 1.10, earlier versions were compiled with Golang 1.9. The code to create the tar file hasn't changed in Packer.

I've extracted the problem to a smaller example code and compiled it with different Golang versions. I have created a gist https://gist.github.com/StefanScherer/9dfeb9fbf6bc5dc35090d698c2703bbc with following files:

What did you expect to see?

I expect that the tar files produced can be extracted with the built-in tar (bsdtar 2.8.3 - libarchive 2.8.3) in macOS 10.13.3.

What did you see instead?

The binary compiled with Golang 1.9 creates a tarfile with a 10 GByte file in it that can be listed and extracted correctly on macOS.

Create tar with darwin binary compiled with Golang 1.9
2018/03/29 19:14:19 Create tar from dir: in => out.tar
2018/03/29 19:14:19 Tar add: 'in/10Gigfile' to 'out.tar'
-rw-------  0 501    20 10737418240 Mar 29 18:55 10Gigfile

The binary compiled with Golang 1.10 creates a tarfile with a 10 GByte file in it that can be listed as an empty file. The tarfile itself is 10 GByte. Trying to extract the tar file leads to an error.

Create tar with darwin binary compiled with Golang 1.10
2018/03/29 19:14:46 Create tar from dir: in => out.tar
2018/03/29 19:14:46 Tar add: 'in/10Gigfile' to 'out.tar'
-rw-------  0 stefan staff       0 Mar 29 18:55 10Gigfile
ianlancetaylor commented 6 years ago

CC @dsnet

dsnet commented 6 years ago

This is difficult for me to reproduce as I don't have the docker environment setup. Can you upload the compressed problematic tar file? If that's still too large, the first 1024 bytes will do.

dsnet commented 6 years ago

I reproduced the problem by just creating a 10GiB file, didn't need to do all the docker stuff. Investigating.

StefanScherer commented 6 years ago

Added some xxd dumps of the first 4096 byte and also a diff to the gist.

dsnet commented 6 years ago

This is a bug in libarchive.

What changed from Go1.9 to Go1.10?

In Go1.9, the archive/tar package was a mess where the exact format used (USTAR, PAX, GNU) was poorly determined, and did not always generate a valid tar file. That is, in certain situations, it would generate an invalid hybrid file using both features from PAX and GNU (even though most tar tools would still happily parse the archive). In Go1.10, this was fixed such that the Writer would always consistently use USTAR when possible, then PAX, and then GNU.

Thus, for the 10GiB file you are writing, Go1.9 happened to choose the GNU format, which libarchive understood. However, Go1.10 chose to use the PAX format (since USTAR can't represent files larger than 8GiB).

What's wrong with libarchive?

The PAX specification says:

size: The size of the file in octets, expressed as a decimal number using digits from the ISO/IEC 646:1991 standard. This record shall override the size field in the following header block(s).

Thus, the presence of a size PAX record should override whatever value is in the subsequent USTAR header. However, libarchive v2.8.3 has a bug where it uses the stale USTAR header and does not properly use the PAX record.

This problem was filed against libarchive in libarchive/libarchive#880 and subsequently fixed in v3.3.2 and on.

What can you do?

Go1.10 gives the user finer control over the output format. You can see tar.Header.Format to tar.FormatGNU to work around the libarchive bug.

What can Go do?

For the cases where a PAX record overwrites a USTAR field, we could encode the NUL byte in the field instead of the '0' character. The NUL byte is a better indication that USTAR field cannot represent the value. This shouldn't be necessary, but could help users with old versions of bsdtar.

StefanScherer commented 6 years ago

Thanks @dsnet for the fast review and excellent tip for a work around.

Indeed, adding the following line

        header.Format = tar.FormatGNU

helps.

dsnet commented 6 years ago

@StefanScherer, great to hear.

It turns out that writing NULs in the size field doesn't help bsdtar, since the condition they have never evaluates to true. As there's nothing we can do here, I'm closing this.

mwhooker commented 6 years ago

Thanks for the investigating, @dsnet. Is it correct to say that future versions of go will use the patched version of libarchive, and create valid 8gb PAX archives without manually setting the header format?

dsnet commented 6 years ago

Is it correct to say that future versions of go will use the patched version of libarchive, and create valid 8gb PAX archives without manually setting the header?

I'm not sure what you mean. libarchive is distributed entirely separately from the Go toolchain. Did you mean archive/tar? If so, then yes, archive/tar will continue to use the PAX format as the default over GNU.

mwhooker commented 6 years ago

ah, I misunderstood, for some reason I was thinking golang used the libarchive implementation. Thanks!