kraken2 on large database

DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system

MIT License

687 stars 267 forks source link

kraken2 on large database #729

Open fgvieira opened 1 year ago

fgvieira commented 1 year ago

I want to use kraken2 on a very large database and, due to memory constraints, I was wondering if it was possible to split the DB into several chunks. This way I could query my sample against each of the chunks and combine the results afterwards.

nicolo-tellini commented 1 year ago

Hello @fgvieira ,

At the point Minikraken (here) you can find what you are looking for. you can also run kraken2 with the option --memory-mapping that avoids loading the database into RAM, this could help too. Otherwise, if relevant, you can create custom DB.

best,

nic

slw287r commented 1 year ago

I want to use kraken2 on a very large database and, due to memory constraints, I was wondering if it was possible to split the DB into several chunks. This way I could query my sample against each of the chunks and combine the results afterwards.

KrakenUniq has option for low-memory computers via --preload-size since version 0.7

fgvieira commented 1 year ago

I am creating a custom DB that takes up roughly 500Gb (but might increase in the future).

@nicolo-tellini will --memory-mapping be slower?

nicolo-tellini commented 1 year ago

Hello @fgvieira ,

--memory-mapping doesn't impact the DB creation rather it avoids loading the database into RAM when you run the classification step with the command kraken2. In my case, I noticed an increase in speed (during the classification step) as loading the DB (450GB) onto the RAM is really slow process.

best,

nic