antelle / node-stream-zip

node.js library for fast reading of large ZIPs
Other
448 stars 63 forks source link

Extraction fails on 6.5gb PKWARE (code 9) compressed file #34

Closed the-dixie-flatline closed 6 years ago

the-dixie-flatline commented 6 years ago

OS: Windows 10 64bit Node: v9.11.1 64bit node-stream-zip: 1.6.0

The US Centers for Medicaid and Medicare publishes a zipped csv flat file of current and past providers. Every month they publish the entire database which is a 6.5gb csv uncompressed. We've use node-stream-zip to unpack it for several months but this months release failed with the following error:

Error: too many length or distance symbols
    at Zlib.zlibOnError [as onerror] (zlib.js:142:17)
Emitted 'error' event at:
    at EntryVerifyStream.onerror (_stream_readable.js:675:12)
    at EntryVerifyStream.emit (events.js:180:13)
    at InflateRaw.<anonymous> (S:\Development\projects\arbola\ccb\npi\node_modules\node-stream-zip\node_stream_zip.js:933:14)
    at InflateRaw.emit (events.js:185:15)
    at Zlib.zlibOnError [as onerror] (zlib.js:145:8)

The zip file is available here (it is public data). The file it is failing on is npidata_pfile_20050523-20180408.csv.

7-zip 18.01 and Unzip 6.0 both unpack it without issue.

I hope you have time to look at this. Thanks for a very useful library regardless.

the-dixie-flatline commented 6 years ago

Just realized this is probably an issue with zlib itself. Closing.

the-dixie-flatline commented 6 years ago

Submitted Issue #20098 against node.

antelle commented 6 years ago

Hi! I don't think it's a node.js issue. Probably the archive is corrupted or it's our bug.