Closed guillaumerxl closed 7 years ago
Thanks for the report - this seems to be some sort of race condition (definitely a bug that needs fixing). Will investigate
I have addressed the race condition here: https://github.com/ZJONSSON/node-unzipper/pull/19 - published as unzipper@0.7.4
. Can you please verify that your code works now?
Here are other comments:
entry.path
if that is helpful for the output filenameHere is an example of how the resolve is triggered only when the file-write has finished (not just the only when the file-write has ended:
module.exports = function (url, filename) {
return new Promise((resolve, reject) => {
request.get(url)
.pipe(unzipper.ParseOne())
.pipe(fs.createWriteStream(`data/${filename}.csv`))
.on('error',reject)
.on('finish',resolve({
url: url,
path: `data/${filename}.csv`
});
})
}
or using the etl
library:
module.exports = function (url, filename) {
return request.get(url)
.pipe(unzipper.ParseOne())
.pipe(etl.toFile(`data/${filename}.csv`))
.promise()
.then( () => ({
url: url,
path: `data/${filename}.csv`
});
}
I pulled the repo and it works with my first code. However, your examples doesn't work, they don't finish (no errors). I will investigate further.
Thanks for your help and your work!
It seems that the original fix was wrong. The content length according to the entry header is greater than the actual length of the data -> we read past the end of the file and never finish
(although we still manage to pipe all content for this single file downstream). Needs further debugging
The issue here is that the size information was stored in the 64bit extra information. Here is a fix - further debugging might be required: https://github.com/ZJONSSON/node-unzipper/commit/22ad1092aa6e1b1200e7de766e17c1dbdd2ce53b See also: https://github.com/cthackers/adm-zip/commit/5e4e4f8fcac5e9d84edd09f1ce5c76d74016b8bd
I have resolved the 64bit case but still ended up with a bad signature after the first EndOfDirectory record for your example file. I added a small hack to ignore the EndOfDirectory, so the examples above should work with: unzipper.ParseOne({bypassDirectory:true});
or unzipper.Parse({bypassDirectory:true})
. The finish events should emit properly now (published as unzipper@0.7.5
)
Please let me know if this works and reopen if the issues persist.
This should be a more general solution https://github.com/ZJONSSON/node-unzipper/pull/22/commits/d5db375339d5bac6f854a2f2b73f18c87e5371b8
No need to bypassDirectory
- we will simply bypass any signature issues if-and-only-if we are in the central directory and there is really no more data to extract anyway.
included in unzipper@0.7.6
Okay I'll try that tomorrow and give you a feedback. I didn't find the time to check your updates.
Thanks!
Hello, I tried with the two solutions you gave me and it works well!
Thanks again!
Hello,
I made a script that downloads the
sirene_201612_L_M.zip
from http://files.data.gouv.fr/sirene/ that contains a.csv
file (file size: 8.5G). The script throws the following error:If I refer to the code, I see that the part throwing the error is here: https://github.com/ZJONSSON/node-unzipper/blob/master/lib/PullStream.js#L78.
But the file seems to be entirely downloaded so I don't understand what could be the problem.
Here is the part of my code:
Can you help me to understand if it's my code, a bug, a real problem with the file or another reason please? Thanks!