gmarcais / Jellyfish

A fast multi-threaded k-mer counter
Other
462 stars 136 forks source link

Is it possible to output the source of reads of each kmer #105

Open bzvew opened 6 years ago

bzvew commented 6 years ago

If the read name containing each can be found, I think jellyfish can be used to do more creative work, rather than just counting kmer, e.g. remove reads with some high occurrence kmer.

MaximilianStammnitz commented 6 years ago

Hi @bzvew and Guillaume @gmarcais,

Following up on the request above: we have just used the subset k-mer counting and are pretty happy with the results. Also pulling out the (relatively few) read IDs/sequences which match the k-mer counts would be fantastic, ideally in a separate file. Is this information internally stored by Jellyfish at some (final?) stage, or would it require additional coding to store reads once they have been identified as a hit/target during the hashing steps?

Thanks a lot, Max