SamStudio8 / reticulatus

A snakemake-based pipeline for assembling and polishing long genomes from long nanopore reads
MIT License
68 stars 5 forks source link

Override of medaka calls_to_draft is not working #48

Closed jagos01 closed 4 years ago

jagos01 commented 4 years ago

Hello Sam, I enjoyed your talk last week at LC2020 and it prompted me to try this pipeline.  I am trying to assemble a bacterial isolate. The assembly finished but the pipeline failed while executing rule ktkit_rollup (name 'get_ref_tids_by_group' is not defined). I am not sure if I am using ref.cfg properly.  I have tried the 'ref' key as a local file and a ftp address, both failed in the same rule.  The pipeline works when I omit the 'refgroup' key from manifest.cfg. I have attached a copy of the log and ref.cfg. I am also having trouble getting medaka to work. I am not sure what to enter in the 'medaka_env' key in config.yaml.   Do I need medaka installed in a singularity container?  I have a working gpu version of medaka installed in a venv that is in my path. Any help would be appreciated. Thanks, Scott

2020-06-22T123001.713968.snakemake.log ref.zip

SamStudio8 commented 4 years ago

Hi Scott, Thanks for giving this a whirl and I'm sorry it isn't working. I'm guessing I haven't bundled an update to ktkit at some point so I will check that for you tomorrow. For medaka, if you've already got your own GPU version working, you can just delete the reference to singularity in the rule in Snakemake-base and make sure it's on your path. Let me know how you get on.

SamStudio8 commented 4 years ago

Can you try using the Snakemake-ref not Snakemake-base if you have a refgroup set and see if that works? The reference stuff was something that I was working on just before everything went terrible so it was a WIP back in March!

jagos01 commented 4 years ago

Thanks for the quick reply. It turns out that the pipeline does not run when I omit the 'refgroup' key from manifest.cfg. No errors were raised when I used the -n flag for a dry run, but polish_racon fails when the pipeline is actually run. /reticulatus/working/log/SpoIIA_test.flye25.ctg.cns.racon-ont-1.fa contains - racon: unrecognized option '--cudapoa-batches'. I have gpu enabled racon in my path but it is not accessed from within the reticulatus env. it is using the version from within the reticulatus env. I have attached to log. 2020-06-22T164244.932845.snakemake.log

Thanks again, Scott

SamStudio8 commented 4 years ago

Yup, as per the docs:

polish_racon: you will need a racon binary compiled with CUDA, for your system and have it appear on your $PATH before any other installed versions of racon

I have a little bash script that activates my conda environment, then exports my gpu version of racon before the rest of the $PATH (you need to do this after you activate the env so you can take priority over the paths added by conda). By default reticulatus will otherwise use the version of racon from the conda environment instead. I've updated the README to clarify this better.

jagos01 commented 4 years ago

Hello Sam, Thanks for tip about the bash script. The pipeline now works with racon polishing but I still get the following error when I try to polish with medaka:

MissingInputException in line 436 of /home/scott/reticulatus/Snakefile: Missing input files for rule polish_medaka: medaka-SpoIIA_test.flye25.ctg.cns.racon-ont-4.medaka-ont-1.fa/calls_to_draft.bam.bai medaka-SpoIIA_test.flye25.ctg.cns.racon-ont-4.medaka-ont-1.fa/calls_to_draft.bam

Scott

SamStudio8 commented 4 years ago

Hi Scott! The missing file exception was reported by a keen user yesterday and the fix is already deployed to the git repo - can you do a git pull and replace the current Snakefile and see if that works?

On Tue, 23 Jun 2020, 08:52 jagos01, notifications@github.com wrote:

Hello Sam, Thanks for tip about the bash script. The pipeline now works with racon polishing but I still get the following error when I try to polish with medaka:

MissingInputException in line 436 of /home/scott/reticulatus/Snakefile: Missing input files for rule polish_medaka:

medaka-SpoIIA_test.flye25.ctg.cns.racon-ont-4.medaka-ont-1.fa/calls_to_draft.bam.bai

medaka-SpoIIA_test.flye25.ctg.cns.racon-ont-4.medaka-ont-1.fa/calls_to_draft.bam

Scott

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/SamStudio8/reticulatus/issues/48#issuecomment-647974644, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIN6OUEN57U4WOOBAQ63W3RYBNNLANCNFSM4OFAQN4A .

jagos01 commented 4 years ago

Hello Sam, After doing a git pull the pipeline starts but then I get the following error.

Waiting at most 5 seconds for missing files. MissingOutputException in line 440 of /home/scott/reticulatus/Snakefile: Missing files after 5 seconds: medaka-SpoIIA_test.flye25.ctg.cns.racon-ont-4.medaka-ont-1.fa/calls_to_draft.bam medaka-SpoIIA_test.flye25.ctg.cns.racon-ont-4.medaka-ont-1.fa/calls_to_draft.bam.bai This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait. Job failed, going on with independent jobs. Exiting because a job execution failed. Look above for error message

Thanks, Scott

SamStudio8 commented 4 years ago

Hi Scott, silly question but just to check you replaced the script too (cp Snakefile-base Snakefile)? I was sure we had solved this yesterday.

Sam

On Tue, Jun 23, 2020 at 3:37 PM jagos01 notifications@github.com wrote:

Hello Sam, After doing a git pull the pipeline starts but then I get the following error.

Waiting at most 5 seconds for missing files. MissingOutputException in line 440 of /home/scott/reticulatus/Snakefile: Missing files after 5 seconds:

medaka-SpoIIA_test.flye25.ctg.cns.racon-ont-4.medaka-ont-1.fa/calls_to_draft.bam

medaka-SpoIIA_test.flye25.ctg.cns.racon-ont-4.medaka-ont-1.fa/calls_to_draft.bam.bai This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait. Job failed, going on with independent jobs. Exiting because a job execution failed. Look above for error message

Thanks, Scott

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/SamStudio8/reticulatus/issues/48#issuecomment-648204736, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIN6OWCDEWVQX7HO2L3M23RYC4YXANCNFSM4OFAQN4A .

SamStudio8 commented 4 years ago

Apologies, ignore my last message, thinking about it, that sounds like the job ran but didn't do what it was supposed to. Do you have any more information from the logs? Does pre_calls_to_draft.bam exist in that directory?

jagos01 commented 4 years ago

HI sam, Pre_calls_to_draft.bam does not exits. I have attached the contents of working directory and the snakemake log. Thanks, Scott

working.txt

2020-06-23T073822.372631.snakemake.log

SamStudio8 commented 4 years ago

@jagos01 Hi Scott, I've just looked at the rules and note that I've marked the pre_calls_to_draft ouput BAM as temporary. This works in the case where subsampling is activated, but when subsampling is disabled reticulatus symlinks the pre_calls files. I suspect what's happened here is the symlink has been successfully created, then the temporary files are destroyed, leaving a dangling symlink. I've pushed a possible fix in bb2830a8011ca49cec21e1d078fd9277fe726783. Please pull from the stable branch and try again.

Thanks for your patience!

SamStudio8 commented 4 years ago

@jagos01 Thanks for the log, I think the log supports my suspicions so hopefully not marking them as temporary will fix the problem. Let me know! I'll leave your bug report open until we get to the bottom of things.

jagos01 commented 4 years ago

Hello Sam, I pulled the latest fix and unfortunately get the same error as above. pre_calls_to_draft.bam is present but calls_to_draft.bam is a broken link. Thanks, Scott

SamStudio8 commented 4 years ago

@jagos01 That's progress - at least the pre_calls_to_draft exists now. What is it the calls_to_draft a broken link to? Is it pointing to the wrong location?

SamStudio8 commented 4 years ago

@jagos01 I have a minimal working example to trigger this issue now. I'll let you know when I've fixed it.

jagos01 commented 4 years ago

calls_to_bam is pointing to medaka-SpoIIA_test.flye25.ctg.cns.racon-ont-4.medaka-ont-1.fa/pre_calls_to_draft.bam which looks the right place. Thanks, Scott

SamStudio8 commented 4 years ago

Hi Scott, this was a silly mistake from me. I forgot the paths will be relative here. The symlinks are dangling because they are not created relative to the current location. I've added the -r flag to the step to create the link properly (https://github.com/SamStudio8/reticulatus/commit/639664ec41a0db717e3172ffdac82f98d716088a). The step completes on my example and successfully runs medaka.

jagos01 commented 4 years ago

Thanks Sam, he pipeline finished with medaka on my end as well. Scott

SamStudio8 commented 4 years ago

@jagos01 Excellent. Sorry about that - overriding the calls to draft BAM was the last thing I was working on before COVID so it took me a little bit of time to get back in the driving seat and remember what I was doing. Thanks for your patience, I hope the assembly looks good!

jagos01 commented 4 years ago

No worries. Thanks again for all your help.