COMBINE-lab / piscem

Rust wrapper for the next generation (still currently in C++)
BSD 3-Clause "New" or "Revised" License
20 stars 1 forks source link

Re-use preexisting KFF #19

Closed JosephLalli closed 5 months ago

JosephLalli commented 5 months ago

Hi there,

As part of a pangenomics-based workflow, I have whole-genome kff files for each sample. (Probably important, this does not include kff files for each transcriptome).

Is it possible to reuse these kmer counts with piscem? I'm guessing not, but I want to ask here before writing off the possibility.

Thanks, Joe

rob-p commented 5 months ago

This isn't possible, as piscem implements a mapping algorithm that requires it's own index (the piscem index, which pairs a contig table with the sshash data structure for a k-mer index). I believe this contains strictly more information than the underlying kff file, as, with the piscem index it's possible not only to enumerate all k-mers, but also to recall the orientation and position of each k-mer within each input reference. These capabilities are also used by the mapping algorithm. So apart from not accepting kff files as input, I think there is an "information" deficit between what the piscem index provides and what is present in kff files.

Best, Rob

JosephLalli commented 5 months ago

I suspected as much, but I appreciate the reply, thank you!