COMBINE-lab / pufferfish

An efficient index for the colored, compacted, de Bruijn graph
GNU General Public License v3.0
107 stars 19 forks source link

Would Pufferfish index and align be more efficient if I merged each genome into a single contig and FASTA entry per genome? #43

Closed hermidalc closed 1 year ago

hermidalc commented 1 year ago

At NCBI each genome FASTA can have multiple entries, generally the assembly plus unplaced additional contigs. Would is be more efficient to merge these into one FASTA entry per genome before index and align?

hermidalc commented 1 year ago

This is very likely not a factor in performance, since it only adds additional "colors" to the Pufferfish index, so no need to answer this question