Open Confurious opened 6 years ago
From my experience, you need about a little more than 3 times memory as much as the fasta file's size.
Thanks! I am trying to index a rather large database (>500GB), anyone had experience with this type of task? I tried with Kraken but did not work after many weeks
On Sat, Jun 16, 2018 at 10:11 PM Li Song notifications@github.com wrote:
From my experience, you need about a little more than 3 times memory as much as the fasta file's size.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/infphilo/centrifuge/issues/129#issuecomment-397855250, or mute the thread https://github.com/notifications/unsubscribe-auth/AVzXMXb-B7ZveJjcCH-sMBG0ZGtweezsks5t9eTzgaJpZM4UqtMK .
-- Sincerely yours, Chao Jiang
Hi, So this explains why I get stuck when trying to index the bacteria WGS database including draft genomes (~470GB) with 125GB? It would be cool to be able to download the indexes instead... I work on veterinary microbiology and there are many bacterial species with only draft genomes.
I am trying to index something >500 GB with 3 TB memory and so far it has been 7 days and it is not done yet (maybe not eve halfway?). Yes if someone has done it, it would be great to share, although I suspect it would not be so easy to share something that big
On Sat, Jun 23, 2018 at 11:24 PM shlomobl notifications@github.com wrote:
Hi, So this explains why I get stuck when trying to index the bacteria WGS database including draft genomes (~470GB) with 125GB? It would be cool to be able to download the indexes instead... I work on veterinary microbiology and there are many bacterial species with only draft genomes.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/infphilo/centrifuge/issues/129#issuecomment-399732980, or mute the thread https://github.com/notifications/unsubscribe-auth/AVzXMS5AUYhsE2VrDE8mECCvF4sAonIjks5t_zCVgaJpZM4UqtMK .
-- Sincerely yours, Chao Jiang
Did you run it with multiple threads?
Yes, I used 32. i didn't know the node had 700 cpus. Would probably use more if this time it fails. 10 days now.
On Wed, Jun 27, 2018 at 6:09 PM Li Song notifications@github.com wrote:
Did you run it with multiple threads?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/infphilo/centrifuge/issues/129#issuecomment-400878651, or mute the thread https://github.com/notifications/unsubscribe-auth/AVzXMb3gUVTJqpKKIVWJxVVwC7-xBA5Pks5uBCzOgaJpZM4UqtMK .
-- Sincerely yours, Chao Jiang
I read that one of the advantages of centrifuge is that it requires less space and memory than kraken. I am wondering if this is also true for the index building step? What is the rough ratio between the size of fasta database (assuming no compression) and the amount of memory required? How much memory was required to build a index on the NCBI nt database? Thanks!