Closed ghost closed 6 years ago
Hello - so, unfortunately the memory allocation issue you're running into is by design. The BWT (Burrows Wheeler Transform) construction process ends up requiring an amount of memory equal to approximately 8x the packed reference size. I just ran an index on the NR, and the packed size ends up being about 80GB, so it would need around 644GB of RAM total to index a reference this size. This is generally why we focused on using clustered references (like the Uniref90). In the cases where we wanted to refine that last 10%, we declustered the hit subjects into their constituent sequences (using PALADIN plugins) and then ran a secondary alignment off those. But if you want to use very large non-clustered references, you'll need a significant amount of memory (and/or swap) to index.
Alright, thanks for the answer! I will try it again with more memory. Unfortunately my query with the clustered reference wasn't very successful in my case. Less than 1% of nanopore reads found a hit which is substantially less than what I get with other DNA and protein based aligners, so I figured I could try to expand the database first.
Hi,
I'm trying to build a database from NCBI-NR, but seems like bowtie(?) is failing:
There's 500Gb of memory available, don't see why it couldn't allocate 76Gb. Any suggestions? Thanks!