algbio / ggcat

Compacted and colored de Bruijn graph construction and querying
MIT License
73 stars 10 forks source link

Which kmer-counter is actually used when building the graph? #49

Closed Blekin closed 2 months ago

Blekin commented 2 months ago

Hi! I'd like to know the abundance or coverage of unitigs. I notice that GGCAT can provide such information and I'm wondering what k-mer counter is used (standalone programme or built-in programme)?

alexandrutomescu commented 2 months ago

Hi!

The abundances are computed by GGCAT (our own built-in routines during the graph construction) and are reported in the same format as BCALM2, namely (i and f just mean int and float, respectively):

FASTA output header:

><id> LN:i:<length> KC:i:<abundance> km:f:<abundance> L:<+/->:<other id>:<+/-> [..]

Where:

LN field is the length of the unitig

KC and km fields are for total abundance and mean abundance of kmers inside the unitig, respectively.

Blekin commented 2 months ago

Thanks a lot for your reply!