golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.95k stars 17.66k forks source link

archive/zip: reject certain invalid archives #33026

Open MichaelTJones opened 5 years ago

MichaelTJones commented 5 years ago

This is not a defect report, but a suggestion to review the Zip archive reader with this new and authoritative zipbomb treatise in mind:

https://www.bamsoftware.com/hacks/zipbomb/

ianlancetaylor commented 5 years ago

CC @dsnet

dsnet commented 5 years ago

Hi @MichaelTJones, thanks for the report. This attack depends on two properties (called "insights" in the paper).

I believe Go's zip.Writer refuses to encode zip archives like this. However, Go's zip.Reader is probably susceptible to both properties mentioned by insight 1 and 2 (I did not verify). Insight 2 seems to be obviously an invalid ZIP archive, and our reader should error on such files. However, it's not clear to me that the directly overlapping files of insight 1 is invalid. I personally believe this should be an error as well, but I'm concerned that there may be people actually depending on this property.

MichaelTJones commented 5 years ago

I read the whole paper. The other point was that in addition to specific defenses, a careful implementation needs limits on time and space. That seemed good to me--if the expansion passes 1 GiB and the user's not set a more liberal upper limit, then bail.

I do not believe (but do not know for sure) that there are ever valid cases for overlapping file bodies. I'm thinking that this should either be prohibited or like file size limit, be an option that defaults to disabled.

On Wed, Jul 10, 2019 at 10:49 AM Joe Tsai notifications@github.com wrote:

Hi @MichaelTJones https://github.com/MichaelTJones, thanks for the report. This attack depends on two properties (called "insights" in the paper).

I believe Go's zip.Writer refuses to encode zip archives like this. However, Go's zip.Reader is probably susceptible to both properties mentioned by insight 1 and 2 (I did not verify). Insight 2 seems to be obviously an invalid ZIP archive, and our reader should error on such files. However, it's not clear to me that the directly overlapping files of insight 1 is invalid. I personally believe this should be an error as well, but I'm concerned that there may be people actually depending on this property.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/33026?email_source=notifications&email_token=AB4DFJKRBT2ATW2QVG3JJ6DP6YOMJA5CNFSM4H7QSYG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZUHECI#issuecomment-510161417, or mute the thread https://github.com/notifications/unsubscribe-auth/AB4DFJLFRXAUN7X52OB22H3P6YOMJANCNFSM4H7QSYGQ .

--

Michael T. Jonesmichael.jones@gmail.com michael.jones@gmail.com

FiloSottile commented 5 years ago

I read that article when it came out, and while it's extremely interesting in its techniques, I agree with the point it makes that unpacking archives with untrusted sources requires CPU, time and memory limits.

We should fix archive/zip if it accepts invalid archives, but I don't think we can make it safe to run on arbitrary archives without resource limits. Offering an API to easily apply limits on the unpacking process would be interesting, but a separate issue and definitely not for Go 1.13.

Aside from the insights, the table also points out other invalid zips that we decode, so we should probably also fix those.

/cc @rsc since he's credited on that article, and might have relevant insight.

MichaelTJones commented 5 years ago

I'll only add that lately I've been processing an archive (RSC's Go Corpus, split, bundled, compressed, an archived that is 752 MB when decompressed, so the defaults might be pretty big -- a few seconds, a few GB, etc. -- rather than cater to MIME-type decompression sizes.

On Wed, Jul 10, 2019 at 12:07 PM Filippo Valsorda notifications@github.com wrote:

I read that article when it came out, and while it's extremely interesting in its techniques, I agree with the point it makes that unpacking archives with untrusted sources requires CPU, time and memory limits.

We should fix archive/zip if it accepts invalid archives, but I don't think we can make it safe to run on arbitrary archives without resource limits. Offering an API to easily apply limits on the unpacking process would be interesting, but a separate issue and definitely not for Go 1.13.

Aside from the insights, the table also points out other invalid zips that we decode, so we should probably also fix those.

/cc @rsc https://github.com/rsc since he's credited on that article, and might have relevant insight.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/33026?email_source=notifications&email_token=AB4DFJMWNDMTS4TYA3KESPDP6YXOXA5CNFSM4H7QSYG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZUNYWA#issuecomment-510188632, or mute the thread https://github.com/notifications/unsubscribe-auth/AB4DFJND6DLJKKPH74LGREDP6YXOXANCNFSM4H7QSYGQ .

--

Michael T. Jonesmichael.jones@gmail.com michael.jones@gmail.com

qbradq commented 5 years ago

Is anyone working on this? If not I'd like to take it.

MichaelTJones commented 5 years ago

There is more here: https://github.com/golang/go/issues/33036

It is not necessary to address it everywhere at once by the same mechanism, but it does seem important to approach zip decoding as an instance of the general issue. (As in maybe there is a "limit" type passed as an optional argument, limit having setters & setters but opaque contents, and thus the ability to change implementation details later as more kinds of encoding are protected this way.) That's just an idea for consideration.

Nobody "has" this at present.

On Mon, Jul 15, 2019 at 5:29 PM Norman B. Lancaster < notifications@github.com> wrote:

Is anyone working on this? If not I'd like to take it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/33026?email_source=notifications&email_token=AB4DFJMEA3JMRLPJZV2FE4LP7UI6FA5CNFSM4H7QSYG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ7KTCY#issuecomment-511617419, or mute the thread https://github.com/notifications/unsubscribe-auth/AB4DFJOPPNRGHSRJNSA73LDP7UI6FANCNFSM4H7QSYGQ .

--

Michael T. Jonesmichael.jones@gmail.com michael.jones@gmail.com

AxbB36 commented 5 years ago

A couple of things to watch out for here -- reports of in-the-wild zip files that could no longer be processed after Debian merged a patch to reject overlapping files (#931433: unzip: CVE-2019-13232).

qbradq commented 5 years ago

Do we want to support FireFox's non-standard zip layout? The current implementation of zip.Reader searches for a CDE in the last 64KB of the file. This means that although small test files encoded as described in the linked blog post might work, once the blob of data following the CDE grows to 64KB-(size of CDE) it will stop working.

Regarding duplicate CDE's, they can be scrubbed prior to scanning for overlapping files.

I have started toying with this. I note that when I add duplicate header detection we have two test cases that choke on it (due to their inputs having duplicates).

Regarding adding a test case for those janky JAR files, the only example I have been able to track down and get a copy of is 1MB. That seems too large.

MichaelTJones commented 5 years ago

I don't know of a prevailing philosophy beyond robust, general, and least surprise.

It seems that "avoiding disaster" and "spec compliance policing" are different goals. GCC/C++ has the notion of "strict" for such cases. Ian and Rob are likely the best to opine here.

On Mon, Jul 22, 2019 at 6:56 AM Norman B. Lancaster < notifications@github.com> wrote:

Do we want to support FireFox's non-standard zip layout? The current implementation of zip.Reader searches for a CDE in the last 64KB of the file.

Regarding duplicate CDE's, they can be scrubbed prior to scanning for overlapping files.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/33026?email_source=notifications&email_token=AB4DFJIAG7YP42C4VKUSUBLQAW4AFA5CNFSM4H7QSYG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2P7TGA#issuecomment-513800600, or mute the thread https://github.com/notifications/unsubscribe-auth/AB4DFJPT2YHJBHRVSJX62G3QAW4AFANCNFSM4H7QSYGQ .

--

Michael T. Jonesmichael.jones@gmail.com michael.jones@gmail.com

qbradq commented 5 years ago

I am just about ready to submit the CL for the zip bomb detection. I am going to go ahead and push the CL with full tests including the large binary files for review. If they need to be removed I can take care of it on the CL.

AxbB36 commented 5 years ago

Do we want to support FireFox's non-standard zip layout? The current implementation of zip.Reader searches for a CDE in the last 64KB of the file. This means that although small test files encoded as described in the linked blog post might work, once the blob of data following the CDE grows to 64KB-(size of CDE) it will stop working.

No, zip.Reader searches for the EOCD in the last 64 KB. The EOCD contains a pointer to the central directory, which may be anywhere in the file. zip.Reader does not currently have any problem with the Firefox layout, even if the zip file is larger than 64 KB.

qbradq commented 5 years ago

Do we want to support FireFox's non-standard zip layout? The current implementation of zip.Reader searches for a CDE in the last 64KB of the file. This means that although small test files encoded as described in the linked blog post might work, once the blob of data following the CDE grows to 64KB-(size of CDE) it will stop working.

No, zip.Reader searches for the EOCD in the last 64 KB. The EOCD contains a pointer to the central directory, which may be anywhere in the file. zip.Reader does not currently have any problem with the Firefox layout, even if the zip file is larger than 64 KB.

Thank you for the clarification!

gopherbot commented 5 years ago

Change https://golang.org/cl/187357 mentions this issue: archive/zip: detect and reject Better Zip Bomb

AxbB36 commented 5 years ago

Change https://golang.org/cl/187357

// Check for overlapping file bodies (Better Zip Bomb)
for il := range z.File {
    l := z.File[il]
    lstart := uint64(l.headerOffset)
    lend := uint64(l.headerOffset) + l.CompressedSize64
    for ir := il + 1; ir < len(z.File); ir++ {
        r := z.File[ir]
        rstart := uint64(r.headerOffset)
        rend := uint64(r.headerOffset) + r.CompressedSize64
        if lend <= rstart || lstart >= rend {
            continue
        }
        return ErrInvalid
    }
}

This looks like an O(n2) algorithm. You should test whether its performance is adequate on a zip file containing many files, such as the ffff.zip from https://github.com/thejoshwolfe/yauzl/issues/108. You may be better off first sorting the array of Files (or indices into the array) by their headerOffset. After that, you only need a linear scan to check whether File n overlaps File n+1.

lend := uint64(l.headerOffset) + l.CompressedSize64 is the wrong calculation because it ignores the size of the local file header itself, which is variable because of the filename and extra field. I think what you want is lend := l.DataOffset() + l.CompressedSize64 (except that you need to check for error from DataOffset).

The additions probably have integer overflow issues. You should test with some Zip64 files that are hacked to have sizes close to 264.

There's no need to include the entire zbsm.zip–and doing so is probably a bad idea, as its hash has no doubt been added to malware detection engines by now. You can generate your own smaller test case using the source code.

$ ./zipbomb --mode=quoted_overlap --num-files=3 --compressed-size=234 > test.zip
$ wc -c test.zip
500 test.zip
$ ./ratio test.zip
test.zip    678120 / 500    1356.24 +31.323 dB

I will say, though, that in my opinion the motivation for this patch is misguided. Its most likely outcome is breaking compatibility with some genuine zip files, without doing much to protect callers against zip bombs. Even a naive non-overlapped DEFLATE bomb can have a compression ratio over 1000, which will be too much for some applications.

jorangreef commented 5 years ago

Just in case it will help with crafting this fix, https://github.com/ronomon/zip has mitigations for overlapping local file headers, as well as more "obviously bad data".