bingmann / cobs

COBS - Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)
https://panthema.net/cobs
MIT License
83 stars 15 forks source link

COBS for 600K genomes #22

Open davidmaimoun opened 2 years ago

davidmaimoun commented 2 years ago

Hi, I am in charge to find presence of specific genes in 600.000 Salmonella's genomes. I used COBS on few genomes for training But I don't really understand the output... I copied a subsequence (55 bp) from one of my genomes, and run COBS to see if it get it. In the output I got 24 (see bellow). And when I choose bigger sub sequence, sometimes it doesn't find it at all.

Another issue: how I can see if my query fully matchs or partially?

--- end of document list (5 entries) --- documents: 5 minimum 31-mers: 2811023 maximum 31-mers: 2874904 average 31-mers: 2834688 total 31-mers: 14173442 DIE: Output file exists, will not overwrite without --clobber @ /opt/conda/conda-bld/cobs_1646087618998/work/cobs/construction/compact_index.cpp:213 terminate called without an active exception

SRR18349609 24 SRR18349610 24 SRR18349611 24

TIMER info=search hashes=9.929e-06 io=0.000567883 total=0.000577812

Query length 55

I'd really appreciate your help

Thank you!

iqbal-lab commented 2 years ago

Hi there. Sorry to be a pain but could you raise this issue here https://github.com/iqbal-lab-org/cobs, and we can answer there tomorrow . Can you also give the command you ran? Thanks

davidmaimoun commented 2 years ago

It is very kind from you to help me I'll do that!

Thank you for all

davidmaimoun commented 2 years ago

Hi Mr Iqbal, Sorry to disturb you again, but I submitted my question 9 days ago and I didn't get an answer. I also try the web site, but no answer yet. Do you know if there is another way to reach somebody for help

Thank you