genepi / imputationserver

Michigan Imputation Server: A new web-based service for imputation that facilitates access to new reference panels and greatly improves user experience and productivity
https://imputationserver.sph.umich.edu/
GNU Affero General Public License v3.0
76 stars 41 forks source link

Higher zip file corruption rates with aes256 option enabled #135

Open abought opened 6 months ago

abought commented 6 months ago

Summary

TIS has received a small but steady stream of bug reports involving zip file corruption, most commonly when using the AES256 encryption option.

EDIT: There are some reports of corruption with regular zip files as well (including for small files). It is not clear whether zip compression is directly the cause, or whether other factors might be in play. (filesystem issues during the merge step, etc)

What we know

Because this happens with user provided sensitive data, we don't own a reproducible test case that can be shared- this is a longstanding heisenbug with occurrences going back at least 1-2 yr (limit of my time on the project). This also makes it difficult to assess actual rate, though we received multiple issue reports quite recently.

Initial workup

Open questions

seppinho commented 5 months ago

Thanks for the bug report. Unfortunately we don't have a fix for that right now (probably related to the zip4j package). We already replaced the zip4j package in the upcoming pipeline version with the commandline version of 7z (using the -mem=AES256 option).

Fyi: @cfuchsberger mentioned in the past, changing the header information often helped to bypass this problem.