bokulich-lab / q2-assembly

QIIME 2 plugin for (meta)genome assembly.
BSD 3-Clause "New" or "Revised" License
4 stars 12 forks source link

Parallelize assemble megahit #46

Closed Oddant1 closed 11 months ago

Oddant1 commented 1 year ago

Allow assemble megahit to be parallelized

gregcaporaso commented 1 year ago

I'm doing some testing of this now. Before merge, we should enable interfaces to hide actions with names that start with _, since we're planning to use that as a convention for differentiating serial versus parallel actions. Otherwise it won't be clear which action should be used:

$ qiime assembly 
...

Commands:
  -assemble-megahit     Assemble contigs using MEGAHIT.
  assemble-megahit      Assemble contigs using MEGAHIT.
...

EDIT: it's also hard (or not possible) to call -assemble-megahit, because the action name is getting interpreted as a non-existent parameter (-a).

gregcaporaso commented 1 year ago

Reminder that this PR depends on https://github.com/qiime2/q2-demux/pull/145.

Oddant1 commented 1 year ago

Also depends on qiime2/qiime2#690

Oddant1 commented 1 year ago

@gregcaporaso I added two unit tests for running assemble_megahit in parallel, but they currently do not assert anything useful. Would it be sufficient at this time for me to assert that I get the same result both in parallel and in sequence when using the test data?

Oddant1 commented 1 year ago

Blocked by qiime2/q2-demux#145 and qiime2/q2cli#306

Oddant1 commented 1 year ago

The test failures are down to some API changes. I'm not sure it's really worth making them pass until things are more nailed down. Also, _map_sample_reads is a bit heinous now and takes 4 lists because it now needs to be registered as a QIIME 2 action and we do not support nesting collection types for semantic types. . . We probably should have done that tbh.

Now it takes 4 lists of strings and is a bit ugly BUT it should NEVER be called on its own and should only be called by map_reads_to_contigs. @gregcaporaso floated the idea of making methods that start with a double underscore and are super private and never show up on the cli or anything,

misialq commented 1 year ago

Hey @Oddant1, could you please update your branch and resolve the conflicts so that the CI can run in its updated version? Thanks!

Oddant1 commented 1 year ago

@misialq done

ebolyen commented 1 year ago

Looks like you'll need to add q2-demux to the recipe (it should already be in the shotgun distro so it should just work).

codecov[bot] commented 1 year ago

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (debd89a) 98.54% compared to head (73fc328) 98.49%.

:exclamation: Current head 73fc328 differs from pull request most recent head d8aae3c. Consider uploading reports for the commit d8aae3c to get more accurate results

Files Patch % Lines
q2_assembly/bowtie2/mapping.py 97.67% 1 Missing :warning:
q2_assembly/helpers/helpers.py 97.91% 1 Missing :warning:
q2_assembly/megahit/megahit.py 96.42% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #46 +/- ## ========================================== - Coverage 98.54% 98.49% -0.05% ========================================== Files 25 27 +2 Lines 1443 1598 +155 ========================================== + Hits 1422 1574 +152 - Misses 21 24 +3 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Oddant1 commented 1 year ago

blocking bokulich-lab/q2-moshpit#62

gregcaporaso commented 11 months ago

@Oddant1, @misialq - anything I can do to help with this one?

misialq commented 11 months ago

Hey @gregcaporaso, @Oddant1 I'm planning on checking this one out until the end of this week and will get back to you with comments, if any.

@gregcaporaso it would be great if you could help me with the parallelize-kraken one though (https://github.com/bokulich-lab/q2-moshpit/pull/62), although it seems to be blocked by this one... 🤔

gregcaporaso commented 11 months ago

@gregcaporaso it would be great if you could help me with the parallelize-kraken one though

On it!

Oddant1 commented 11 months ago

@misialq, the bokulich-lab/q2-moshpit#62 PR is blocked by this PR because it requires the partition_contigs action from this PR. This is an action that takes a SampleData[Contigs] artifact and splits it into multiple smaller SampleData[Contigs] artifacts, which makes it a fairly fundamental action to parallelizing things in the shotgun distro. The only reason that action is in this PR as opposed to the other one is because I started working on this PR first, so I needed it here first. Would we rather have it exist elsewhere?

gregcaporaso commented 11 months ago

Makes sense re: the dependency. I wonder - would it make sense for these partition actions to be in the types repo that defines the type, so the partition functionality is accessible to anything that can access the type? (Not something to change for this release or PR.)

misialq commented 11 months ago

All is passing locally so I'm merging this one - thanks @Oddant1! 🚀