gmarcais / Jellyfish

A fast multi-threaded k-mer counter
Other
460 stars 136 forks source link

How does jellyfish handle bigger query sequences then the created database? #140

Open martijnbakker1995 opened 5 years ago

martijnbakker1995 commented 5 years ago

Hi,

I have created a db (canoncical) with 51mers. When i run query sequence of 51bp (25bp-SNP-25bp) it'll mostly return counts of 0. However if i lengthen those query sequences to 101bp(50bp-SNP-50bp) it does find occurances in the database.

So i was wondering how does jellyfish handle bigger query sequences then the created database? Does it preform some sort of sliding window over the query sequence or does it just take the first 51bp of the query sequence?

Thanks in advance!

Martijn

gmarcais commented 5 years ago

I am not quite sure I understand what you are doing. Can you give a more concrete example?

martijnbakker1995 commented 5 years ago

Well i create a database of 51mers:

jellyfish count -F 2 <(zcat file1.fastq.gz) <(zcat file2.fastq.gz) -m 51 -s 100M -t 7 -C -o mer_counts.jf

I have query sequences of 101bp.

Then i want to find the count of specific query sequences which are in a file. Only the query sequences in this file are not 51bp long, like the created kmer database, but 101bp long.

So question 1: What does jellyfish do with query sequences which are longer then the kmer database.

The "weird" thing which brought this question up in the first place that when as described above i use the 101bp query's it returns the counts as expected. However when i shorten those sequences to 51bp sequences and use them as a query for jellyfish the count is (almost) always 0

So question number 2: Why does jellyfish find counts when i query 101bp sequences but not if i give a shortend version(51bp) of that same sequence.