bingmann / cobs

COBS - Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)
https://panthema.net/cobs
MIT License
83 stars 15 forks source link

Query length limit #12

Open graceblackwell opened 4 years ago

graceblackwell commented 4 years ago

Would it be possible to increase the query length limit? I am wanting to query sequences up to 300kb and it would be good to avoid having to split the sequences up into chunks.

bingmann commented 4 years ago

Yes, this is possible by copying some of the query code. Will do.

graceblackwell commented 4 years ago

Oh great! Thanks

shenwei356 commented 4 years ago

Hi @bingmann , how about canceling length limit?

bingmann commented 4 years ago

What do you mean with cancel? The score counters can be 16-bit (max 65 Ki query length), or 32-bit (max 3 million query length), 64-bit would also be possible, but expensive memory-wise.

shenwei356 commented 4 years ago

I see, I just figure out that 65535 is the maximum 16bit uint, where you use _mm_add_epi16 for parallelizing k-mer count for 8 documents. So replacing _mm_add_epi16 with _mm_add_epi64 can break the limit, in cost of little more memory usage.

bingmann commented 4 years ago

This limitation has been removed in 05588df18fee9bfdd44f6954059600a399ac2258

Please tell me if the new master version works for you.