National-COVID-Cohort-Collaborative / Phenotype_Data_Acquisition

The repository for code and documentation produced by the N3C Phenotype and Data Acquisition workstream
60 stars 35 forks source link

zipr may produce corrupt zip file #237

Open MPagel opened 4 months ago

MPagel commented 4 months ago

https://github.com/National-COVID-Cohort-Collaborative/Phenotype_Data_Acquisition/blob/6d26382a68b93cd0c6d9e3a03408b5f4e54d154b/Exporters/RExporter/example_execution.R#L120

may produce a corrupted archive.

I had produced an archive that I could locally extract all files, but my 4.5GB OBSERVATION,csv file was at an invalid offset within the archive (observation was the second largest file to MEASUREMENT.csv at 24.0GB within the archive). I tried reinstalling zipR and possibly also tested compression using Windows native zip folders.

The solution that eventually worked for me was to use 7zip from the command-line 7z a MySiteArchive.zip -tzip -mmt -r0 *.csv DATAFILES/*.csv

testing of the archives was performed with

python -m zipfile -t MySiteArchive.zip 7z t MySiteArchive.zip

unfortunately, I did not record what R version or zipR library version I was using at the time.