Really nice tool! The speed and compression improvements over, e.g. gzip, are very impressive.
I think there may be a potential bug in the compression of BAM files. Although the BAM file that I was originally trying has millions of records, I narrowed it down to the following. If I run genozip (v11.0.2) on a SAM file containing the following line, it works fine (genozip --threads 1 -f test.sam):
However, if I convert that SAM file to a BAM file (I'm using sambamba: sambamba view -S -f bam test.sam -o test.bam), and run genozip --threads 1 -f test.bam, I get the following output:
genozip test.bam : 0%op_len=1 too long in vb=1494270:[1] 28905 abort (core dumped) genozip --threads 1 -f test.bam
I think that it is complaining about the length of the number in the middle of the CIGAR string (i.e. 1494270). If I remove one digit from that number, and reconvert the SAM file to BAM, then genozip works without error.
Hi,
Really nice tool! The speed and compression improvements over, e.g. gzip, are very impressive.
I think there may be a potential bug in the compression of BAM files. Although the BAM file that I was originally trying has millions of records, I narrowed it down to the following. If I run genozip (v11.0.2) on a SAM file containing the following line, it works fine (
genozip --threads 1 -f test.sam
):NS500125:680:HNHVYBGXG:2:11209:16805:14650 256 4 145637796 1 9M1494270N67M * 0 GAGTACGGGGAAGTCATGGAGGGAGACTAGTGCCTAGTATTTGCGGTGCCTGAAAACTTTCTTAAGAAGCAGTTGT A/AAAEEEEEEEEEEEEEAE/EAEEEEEE6AEAEEEEEEEEAEEE<EAAEEEEEEEEEEEEE/EEEAEEEEAAEAE NH:i:4 HI:i:4 AS:i:69 nM:i:1 XS:A:+
However, if I convert that SAM file to a BAM file (I'm using sambamba:
sambamba view -S -f bam test.sam -o test.bam
), and rungenozip --threads 1 -f test.bam
, I get the following output:genozip test.bam : 0%
op_len=1 too long in vb=1494270:
[1] 28905 abort (core dumped) genozip --threads 1 -f test.bam
I think that it is complaining about the length of the number in the middle of the CIGAR string (i.e. 1494270). If I remove one digit from that number, and reconvert the SAM file to BAM, then genozip works without error.