aindj / k-SLAM

k-SLAM ultra fast alignment and taxonomic classification of metagenomic datasets
GNU General Public License v3.0
23 stars 5 forks source link

Database simple compression [enhancement] #18

Open g1o opened 6 years ago

g1o commented 6 years ago

The database could be in a format similar to the 2bit format ( http://genome.ucsc.edu/FAQ/FAQformat.html#format7 ) A smaller database reduces the time taken to load it from disk. Also, I am not sure how the internal work of it is done right now, but SeqAn (https://www.seqan.de/) probably have something implemented to help optimize a few parts to load and write binary DNA.