COMBINE-lab / cuttlefish

Building the compacted de Bruijn graph efficiently from references or reads.
BSD 3-Clause "New" or "Revised" License
78 stars 8 forks source link

is there a way to get the kmer counts of the kmers in the produced unitigs? #27

Open shokrof opened 1 year ago

shokrof commented 1 year ago

Hi, I am wondering if I can get the kmer counts along with the unitigs fasta files.

Thanks, Moustafa

jamshed commented 1 year ago

Hello Moustafa,

We do not support this at this moment, unfortunately. It might be doable, given that we have the corresponding KMC databases; but attaching the k-mer counts to the unitigs is non-trivial in cuttlefish.

shokrof commented 1 year ago

is there a way to keep the KMC databases? I can make an easy script to get the kmer counts if can get the kc databases

jamshed commented 1 year ago

Hi @shokrof: are you using cuttlefish 2, i.e. using the --read or --ref arguments? If yes, then we have valid counts only for the (k + 1)-mers—the k-mers are extracted from them and their counts aren't obtainable in a straightforward manner from the (k + 1)-mers. If the (k + 1)-mer counts would do, then I can post some instructions here to retain the KMC databases.

rchikhi commented 11 months ago

Here's what to change to retain the KMC database of (k+1)-mer counts: keep-edges-counts-diff.txt