divonlan / genozip

A modern compressor for genomic files (FASTQ, SAM/BAM/CRAM, VCF, FASTA, GFF/GTF/GVF, 23andMe...), up to 5x better than gzip and faster too
Other
159 stars 12 forks source link

Failed to decompress the description line of a FASTA-format file #15

Closed lileiting closed 3 years ago

lileiting commented 3 years ago

Hi,

Here is an example FASTA sequence file to reproduce the error.

>g1 1|-6|0|5|0|204
A
>g2 0.66|0|0|6|0|202
A

After compressing the above sequence file using genozip and decompress it using genounzip, the resulted sequence file became

>g1 1|-6|0|5|0|204
A
>g2 0.60|0|0|6|0|202
A

And genounzip throw an error:

genounzip seq.fasta.genozip : 
genounzip: Adler32 of reconstructed vblock=1,component=1 (122686285 ) differs from original file (128977747 ).
Note: genounzip is unable to check the Adler32 subsequent vblocks once a vblock is bad
Bad reconstructed vblock has been dumped to: seq.fasta.genozip.vblock-1.start-0.len-44.bad
To see the same data in the original file:
   cat seq.fasta | head -c 44 | tail -c 44 > seq.fasta.genozip.vblock-1.start-0.len-44.good
genounzip: File integrity error: Adler32 of decompressed file seq.fasta is 122686285 , but Adler32 of the original FASTA file was 128977747 
Done (0 seconds)

Leiting

divonlan commented 3 years ago

Hi Leiting, thank you so much for reporting this issue, very much appreciated.

The bug is now fixed and the fix is pushed to github.