gmarcais / Jellyfish

A fast multi-threaded k-mer counter
Other
471 stars 136 forks source link

jellyfish 1.1.12 q-mer counts equal to 0 (0.000000e+00) #158

Open Aranka-S opened 4 years ago

Aranka-S commented 4 years ago

Hi,

I am trying to run jellyfish 1.1.12 for use with Quake. However, I noticed that my qdump files contain (many) zero counts. If I compare with the output of a normal k-mer count with jellyfish, I obtain the same k-mers, and no zero-counts anywhere If I grep the k-mers with zero q-mer count back in my reads set I see that those k-mers are indeed present in the read set. If you try to calculate the quality score of those k-mers by hand, they are never that small that you would expect this to occur due to rounding.

Any idea of what is causing this?

The commands I used:

~/Programs/jellyfish-1.1.12/bin/jellyfish count -q --quality-start 33 -c 8 -o reads10.jellyfish -m 21 -t 5 -s 10M --both-strands reads10.fastq
~/Programs/jellyfish-1.1.12/bin/jellyfish qdump -c reads10.jellyfish_0 > reads10.qcts
~/Programs/jellyfish-1.1.12/bin/jellyfish count -c 8 -o reads10k.jellyfish -m 21 -t 5 -s 10M --both-strands reads10.fastq
~/Programs/jellyfish-1.1.12/bin/jellyfish dump -c reads10k.jellyfish_0 > reads10.cts

I also extracted a read from which I get a lot of zero q-mer counts (both when you run jellyfish on the read alone and when I look the k-mers of the reads back up in the counts of the full read set):

@_5:1:1:11594:27750/1
AGCGTACTTGATATTCTTTGGTATATTGCTTGGGTGATAGGATTTAATAGCCCTTGAAGCGCGTTTCTTATAGTATTGGCCGCATTAAAATTCCCTACGGACGTTGGTCCAGATATAAATCCCAGGATAATAACTATTCCCGTAGAATAT
+
<@@ADFADHHDHGIJIJJJJIDHEBEIGEHGHJID@F<FHEDGIIIHHIIEIJGI@HICGGIIGHHBEFCDFF:6>CA>CAB@BDDCDA>ACD>@:@A35?>@B?B?@CCDCDACEDEE@>@CCC@BCCDDEDCDCCEC@CBBBBBDD=4
head test.qcts
ACCAAAGAATATCAAGTACGC 0.000000e+00
AATAACTATTCCCGTAGAATA 9.903701e-01
GAATTTTAATGCGGCCAATAC 9.779463e-01
AGAAACGCGCTTCAAGGGCTA 0.000000e+00
CGCGTTTCTTATAGTATTGGC 9.820805e-01
TATAGTATTGGCCGCATTAAA 9.794062e-01
CCAATACTATAAGAAACGCGC 9.820804e-01
ATTCTACGGGAATAGTTATTA 9.915474e-01
AACGCGCTTCAAGGGCTATTA 0.000000e+00
CCCTACGGACGTTGGTCCAGA 9.591861e-01

head test.cts
ACCAAAGAATATCAAGTACGC 1
AATAACTATTCCCGTAGAATA 1
GAATTTTAATGCGGCCAATAC 1
AGAAACGCGCTTCAAGGGCTA 1
CGCGTTTCTTATAGTATTGGC 1
TATAGTATTGGCCGCATTAAA 1
CCAATACTATAAGAAACGCGC 1
ATTCTACGGGAATAGTTATTA 1
AACGCGCTTCAAGGGCTATTA 1
CCCTACGGACGTTGGTCCAGA 1
gmarcais commented 4 years ago

I'll admit that I have not used or looked at the q-mer feature in a long time. Would you mind sharing a small subset of the data that exhibits the issue with 0 counts?

Aranka-S commented 4 years ago

Actually just putting the read I gave above in a .fastq file gives the issue for me. In case it is something weird in my dataset that does not go over with copy-pasting, I attached a test-fastq file (containing the same read and three others) obtained with head and tail commands from the original dataset. The output from jellyfish count -q and the qdump are included as well testset.zip