Closed charlesreid1 closed 6 years ago
There' s been a bit of a lull in the commits on this PR because testing assembly takes a really long time. But things are still going okay so far with both the megahit and metaspades workflows.
UPDATE: The metaspades test failed b/c it ran out of memory on an m5.2xlarge (which has 8 procs and 32 GB of memory). The program printed out at one point that it would require 28 GB, but not sure if it ran out of memory b/c that was just an estimate, or if a later step increased memory requirements. In any case, we are re-running the metaspades on an m5.4xlarge (16 threads, 64 GB memory). That's pretty beefy.
UPDATE: The metaspades test failed again on an m5.4xlarge (16 procs, 64 GB memory) due to same problem - raising an exception about memory allocation. This time it took ~24 hours to encounter the error. The confusing part is that metaspades is claiming it only needs about 28 GB of memory. I plan to re-run this job using a high-memory AWS instance, so we can crank up the amount of memory.
Which data set(s) are running this on? Complete or subset? I think we can test on the 10% subset.
As per Slack conversation with @brooksph, I was using the full reads instead of the subsampled reads. 🤦♂️
Tests (megahit and metaspades) were both successful. This PR is ready to merge.
Assembly with MEGAHIT is good to go. Spades is still running.
SPAdes is also ready. So, that's read filtering and assembly ready to go following successful completion using the scripts on this branch. We also have the approval to store the kaiju databases on s3 which should resolve our taxonomic classification workflow issue.
I'll go ahead and add the kaiju database to an S3 bucket now...
Charles
On Sun, Jul 1, 2018 at 2:59 PM, Phillip Brooks notifications@github.com wrote:
SPAdes is also ready. So, that's read filtering and assembly ready to go following successful completion using the scripts on this branch. We also have the approval to store the kaiju databases on s3 which should resolve our taxonomic classification workflow issue.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dahak-metagenomics/dahak/pull/91#issuecomment-401636221, or mute the thread https://github.com/notifications/unsubscribe-auth/AAWdy6TwNWZAzwXvntQ6SD4_Gqer68iMks5uCUZMgaJpZM4Um6WE .
Closed in favor of https://github.com/dahak-metagenomics/dahak/pull/95
This pull request builds on #83 (add read filtering and taxonomic classification workflows). This PR should happen together with #83.
Changes implemented in this pull request are documented in the assembly documentation. The assembly documentation is added in #92 (add assembly documentation). So, this should also happen together with #92.
Merge Checklist