SamStudio8 / reticulatus

A snakemake-based pipeline for assembling and polishing long genomes from long nanopore reads
MIT License
68 stars 5 forks source link

Recent changes to pre-medaka rules have forced the simg container to be required even for CPU #49

Closed SamStudio8 closed 4 years ago

SamStudio8 commented 4 years ago

One goal of reticulatus is to be very easy to use but we allow some leeway with users who wish to leverage the GPU as they are likely to entertain a little extra faff to enjoy the speed boost. Previously, the pipeline could be run trivially on the CPU but my last additions to the pipeline in Febuary-March have made the singularity image a pre-requisite of the pipeline regardless of whether you're using the CPU or GPU.

This has happened as I abstracted the mini_align step out of medaka to be able to override the generation of the BAM that is used by medaka to polish with. This allows us to interfere with the calls_to_draft BAM before medaka is called, which prevents medaka needing to read the entire BAM but also means we can pipe the result of the mapper straight through a subsampler meaning we can also reduce disk I/O by not having to write the entire mapping to disk.

The side effect of this is that for the base case of not wanting to do any subsampling, we just call mini_align as per the medaka instructions and soft-link the result to the path medaka expects. I've configured this rule to use the mini_align already installed in the medaka container - thereby making it required for the pipeline to work at all.

Singularity is available in conda-forge but a kind tester has indicated that it doesn't seem to work, which might be an issue of its own.

SamStudio8 commented 4 years ago

I think this could be addressed by:

SamStudio8 commented 4 years ago

If medaka is installed to the base env, mini_align should be too.

SamStudio8 commented 4 years ago

Commit 01da39a5ed27c11c9d2ff046db01b39a962a3c14 removed medaka from the base environment which is why new users don't have a base install of mini_align.

SamStudio8 commented 4 years ago

Testing the medaka in bioconda to see if it installs mini_align and such.

SamStudio8 commented 4 years ago

I've broken environments/base.yaml into CPU and GPU flavours. There is little difference other than racon and medaka are not present in the GPU version. This means that the base environment will no longer have a default install of racon (causing command not found errors in cases where users have not installed a proper GPU version of racon, rather than errors that the CUDA options are invalid).

CPU users should not be able to skip the Singularity faff as mini_align and medaka will be installed locally to the base conda environment.

In true conda fashion, I cannot test this right now, as my environment fails to resolve, but a kind user is trying the fix.

SamStudio8 commented 4 years ago

Open question while we await the result

SamStudio8 commented 4 years ago

Works :)

SamStudio8 commented 4 years ago

Big thanks to @bawee for debugging this one with me!