We discovered inconsistencies in kmer histograms on two experimental ONT datasets between uncompressed and compressed FASTA input files*. In independent runs and testing different k values (16,18,20,22,25), two gzipped FASTA ONT (NA19240 [PRJEB29523] and NA12878 [SRR10965087]) read files yielded jagged and uninterpretable kmer profiles. Problem exacerbated at higher k vals. Issue observed with ntcard v1.1.1, v1.2.1 and v1.2.2.
NA12878 ONT FASTA
NA12878 ONT FASTA GZIPPED
====
NA19240 ONT FASTA
NA19240 ONT FASTA GZIPPED
*We have only observed this with FASTA files, not FASTQ files and only when using experimental nanopore data
We discovered inconsistencies in kmer histograms on two experimental ONT datasets between uncompressed and compressed FASTA input files*. In independent runs and testing different k values (16,18,20,22,25), two gzipped FASTA ONT (NA19240 [PRJEB29523] and NA12878 [SRR10965087]) read files yielded jagged and uninterpretable kmer profiles. Problem exacerbated at higher k vals. Issue observed with ntcard v1.1.1, v1.2.1 and v1.2.2.
NA12878 ONT FASTA
NA12878 ONT FASTA GZIPPED
====
NA19240 ONT FASTA
NA19240 ONT FASTA GZIPPED
*We have only observed this with FASTA files, not FASTQ files and only when using experimental nanopore data