crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.37k stars 1.62k forks source link

Zip::Writer Arithmetic overflow when adding 1 million files #14534

Open sqonk opened 5 months ago

sqonk commented 5 months ago

Bug Report

While I was able to find existing issues for "Arithmetic overflow", I could not find any specific to the Zip Writer.

This problem was encountered while deploying a solution for compressing offsite backups for client data on a server.

Sample code to illustrate crash:

require "compress/zip"

dest = "out.zip"
ii = 0
begin
  File.open(dest, "w") do |zipFile|
    Compress::Zip::Writer.open(zipFile) do |zip|
      1000000.times do |i|
        ii = i

        max = 5 * 1024
        totalBytes = Random.rand(100..max)
        zip.add "file#{i}", Random.new.random_bytes(totalBytes)
      end
    end
  end
rescue e
  puts "## Failed adding file to zip on loop #{ii}: #{e.message}"
  puts e.inspect_with_backtrace
end
puts "done"

Result:

Failed adding file to zip on loop 999999: Arithmetic overflow

Arithmetic overflow (OverflowError) from /opt/crystal/src/compress/zip/writer.cr:226:25 in 'write_end_of_central_directory' from /opt/crystal/src/compress/zip/writer.cr:191:5 in 'close' from /opt/crystal/src/compress/zip/writer.cr:56:25 in '__crystal_main' from /opt/crystal/src/crystal/main.cr:129:5 in 'main_user_code' from /opt/crystal/src/crystal/main.cr:115:7 in 'main' from /opt/crystal/src/crystal/main.cr:141:3 in 'main'

Environment:

Dev / Crystal installation: Crystal 1.12.1 [4cea10199] (2024-04-11)

LLVM: 15.0.7 Default target: aarch64-apple-macosx11.0

Blacksmoke16 commented 5 months ago

Pretty sure this is a limit with ZIP archives themselves, in that they can only have 65,535 entries, which is the max value of a UInt16. Once you add one more than this, you get the overflow.

Ideally solution to this would be #7103. But we should at least throw a more informative error message I'd say.

sqonk commented 5 months ago

ok so if I understand correctly what you are saying is the ZIP writer within the Stdlib supports the traditional zip archive format which has a 65k limit on entries and so to go above this you need something like the 3rd party lib you mentioned?

It does solve my immediately problem - so thank you for that.

Interestingly, if I generate 1,000,000 4 byte files on disk and then run the zip command line tool over it, it will sit and think about it for a bit but it will compress the whole thing in the end. Am I confusing the functionality of CLI tool with what the underpinnings of the Zip format or zlib offer?

In any case thank for the feedback.

Blacksmoke16 commented 5 months ago

Yea the API docs for Compress::Zip calls out:

only compression methods 0 (STORED) and 8 (DEFLATED) are supported. Additionally, ZIP64 is not yet supported.

So it's using the original non-ZIP64 version which has that limitation. There is #11396, but not sure of the status of that PR at the moment.

sqonk commented 5 months ago

Thanks for that. I tried out the WeTransfer/cr_zip_tricks and it did work on the isolated test project I made for exploring this issue, however it sadly gave inexplicable compile-time errors (some kind of inability to see the built-in Digest library) on the main project I wanted to use it with.

Lacking the time to find out what the root cause was I found another library: https://shardbox.org/shards/zip64/releases/0.2.1/ ... which worked first time.