COMBINE-lab / salmon

🐟 🍣 🍱 Highly-accurate & wicked fast transcript-level quantification from RNA-seq reads using selective alignment
https://combine-lab.github.io/salmon
GNU General Public License v3.0
774 stars 164 forks source link

Question on k-mer counts #680

Closed alex-d13 closed 2 years ago

alex-d13 commented 3 years ago

Hi salmon team,

I really like your tool and I was wondering if it is possible to output the raw k-mer count tables, that will probably be produced in some intermediate step?

Best, Alex

rob-p commented 3 years ago

Hi @alex-d13,

Do you mean the counts of k-mers in the underlying reference, or the counts of k-mers within reads, or something else? During indexing, the compacted de Bruijn graph index is created, and at that point extracting counts for reference k-mers is possible, though there is currently not a command for it. This is something that might be easy to add to the underlying pufferfish index, so please feel free to raise the issue there if this is your interest.

During mapping / alignment, the algorithm doesn't actually produce k-mer counts at any point since the selective-alignment algorithm is based on finding uniMEMs (maximal exact matches between reads and unitigs in the compacted dBG), chaining them together, and scoring potential alignments. So at this point, we're not really attempting to count matched k-mers but looking at a somewhat more sophisticated notion of mapping.

Best, Rob