Open apredeus opened 6 days ago
The only way to use all the rRNA k-mers and scale down the rest would be to make your own custom FASTA files for the genomes, sampling the k-mers yourself, and then build a database with that. You could include the full rRNA sequence and then choose a fraction of the kmers for the rest. A simpler strategy is just to use the new parameter that allows you to use a huge database with any amount of RAM, --preload-size. I've run a 420GB database on a laptop with 32GB of RAM this way, by using --preload-size 20G. Then krakenuniq will just read in 20GB of the database, classify all the reads and save a temp file, and then read the next 20G and re-classify the reads, etc. It's a bit slower but still quite fast.
Dear KrakenUniq team,
I have a theoretical question - I was wondering if you could suggest something here. I am using KrakenUniq to classify bacterial reads from RNA-seq experiments. The database is a bit too large, and with
--max-db-size
, from what I understand, you select a fraction of random k-mers (e.g. every 2nd or 3rd k-mer). Would it be possible to use all of the k-mers from rRNA portions of the genome, but scale down the rest of the database? I have a RefSeq annotation, so I can split the fasta into rRNA/non-rRNA parts, but I am not sure how can one mix the two.Thank you in advance!
-- Alex