DaehwanKimLab / centrifuge

Classifier for metagenomic sequences
GNU General Public License v3.0
237 stars 73 forks source link

Extract specific kmer #151

Closed novitch closed 5 years ago

novitch commented 5 years ago

Hi, I am wondering if there is a way to extract the kmer signature of a certain gnome that I want to target.

I wasn't able to find a solution using centrifuge-inspect .

Thanks,

mourisl commented 5 years ago

What's the definition of kmer signature here?

novitch commented 5 years ago

I mean the kmer that stay in the database to represent the genome1 (55mer defined that is specific to the genome, when building the database, as described in your first paper)

novitch commented 5 years ago

capture d ecran 2018-10-03 a 17 22 57

novitch commented 5 years ago

like this, is it possible to obtain the 53mers unique to genome 2?

Thanks,

mourisl commented 5 years ago

Oh, this is the genome compression stage, which is independent of index building. So such information is not kept in the index. Note that during the compression, the uniqueness means the portion is unique up so far. So it is possible that genome 3 has that 53-mer as well.

novitch commented 5 years ago

Ok, but there must be , at the end, some data in the .cf files that are characteristic of the genome 2 (even if not unique), no? I would like to see them

mourisl commented 5 years ago

If you used the compressed genome, there is no characteristic of genome 2 stored in the index. You can use the index for uncompressed genome and use centrifuge-inspect to find the sequence of genome 2.

novitch commented 5 years ago

Ok, I get it, sorry I miss understood how centrifuge building and index was working since a long time, so it took a long moment to change my mind..

Cheers,