Long read pipeline Canu issue with Mouse Genome reference

nikhil777shingte commented 1 year ago

I am trying to run a long read pipeline with ONT long reads data for mouse models using Terra platform.

I was able to run https://github.com/broadinstitute/long-read-pipelines/blob/kvg_guppy_cpu/wdl/pipelines/ONT/Preprocessing/ONTBasecall.wdl successfully using my fast5 files.

When I am trying to run https://github.com/broadinstitute/long-read-pipelines/blob/3.0.1/wdl/ONTAssembleWithCanu.wdl , I am running into the below issue. Can you please advise.

2023/08/12 03:56:10 Starting container setup.
2023/08/12 03:56:12 Done container setup.
2023/08/12 03:56:13 Starting localization.
2023/08/12 03:56:19 Localization script execution started...
2023/08/12 03:56:19 Localizing input gs://fc-88614ae6-5245-4a6e-ab14-5c3fc9d007a2/submissions/87ee6a97-1be5-4429-9557-817c073de5ae/ONTAssembleWithCanu/f8d2b9d9-a770-45ef-a9b5-6b2a1d0b456f/call-MergeFastqs/cacheCopy/merged.fq.gz -> /cromwell_root/fc-88614ae6-5245-4a6e-ab14-5c3fc9d007a2/submissions/87ee6a97-1be5-4429-9557-817c073de5ae/ONTAssembleWithCanu/f8d2b9d9-a770-45ef-a9b5-6b2a1d0b456f/call-MergeFastqs/cacheCopy/merged.fq.gz
2023/08/12 03:56:25 Localizing input gs://fc-88614ae6-5245-4a6e-ab14-5c3fc9d007a2/submissions/87ee6a97-1be5-4429-9557-817c073de5ae/ONTAssembleWithCanu/f8d2b9d9-a770-45ef-a9b5-6b2a1d0b456f/call-Canu/Canu/61aef15e-aa55-498e-9e6f-e09331f445da/call-Correct/script -> /cromwell_root/script
2023/08/12 03:56:27 Localization script execution complete.
2023/08/12 03:56:31 Done localization.
2023/08/12 03:56:32 Running user action: docker run -v /mnt/local-disk:/cromwell_root --entrypoint=/bin/bash us.gcr.io/broad-dsp-lrma/lr-canu@sha256:b116e4c74fa74e384491457fb09b6729e40138d00d7611fea912ab130386d9eb /cromwell_root/script
+ canu -correct -p 65209 -d canu_correct_output genomeSize=2731m corMaxEvidenceErate=0.15 correctedErrorRate=0.15 -nanopore /cromwell_root/fc-88614ae6-5245-4a6e-ab14-5c3fc9d007a2/submissions/87ee6a97-1be5-4429-9557-817c073de5ae/ONTAssembleWithCanu/f8d2b9d9-a770-45ef-a9b5-6b2a1d0b456f/call-MergeFastqs/cacheCopy/merged.fq.gz
-- Canu 2.0
--
-- Detected Java(TM) Runtime Environment '1.8.0_252' (from '/usr/local/openjdk-8/bin/java') with -d64 support.
--
-- WARNING:
-- WARNING: Failed to run gnuplot using command 'gnuplot'.
-- WARNING: Plots will be disabled.
-- WARNING:
--
-- Detected 32 CPUs and 31 gigabytes of memory.
-- No grid engine detected, grid and staging disabled.
--
-- ERROR
-- ERROR
-- ERROR Found 1 machine configuration:
-- ERROR class0 - 1 machines with 32 cores with 31 GB memory each.
-- ERROR
-- ERROR Task red can't run on any available machines.
-- ERROR It is requesting:
-- ERROR redMemory=32-48 memory (gigabytes)
-- ERROR redThreads=4-8 threads
-- ERROR
-- ERROR No available machine configuration can run this task.
-- ERROR
-- ERROR Possible solutions:
-- ERROR Change redMemory and/or redThreads
-- ERROR

ABORT:
ABORT: Canu 2.0
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting. If that doesn't work, ask for help.
ABORT:
ABORT: task red failed to find a configuration to run on.
ABORT:
2023/08/12 03:56:34 Starting delocalization.
2023/08/12 03:56:35 Delocalization script execution started...
2023/08/12 03:56:35 Delocalizing output /cromwell_root/memory_retry_rc -> gs://fc-88614ae6-5245-4a6e-ab14-5c3fc9d007a2/submissions/87ee6a97-1be5-4429-9557-817c073de5ae/ONTAssembleWithCanu/f8d2b9d9-a770-45ef-a9b5-6b2a1d0b456f/call-Canu/Canu/61aef15e-aa55-498e-9e6f-e09331f445da/call-Correct/memory_retry_rc
2023/08/12 03:56:37 Delocalizing output /cromwell_root/rc -> gs://fc-88614ae6-5245-4a6e-ab14-5c3fc9d007a2/submissions/87ee6a97-1be5-4429-9557-817c073de5ae/ONTAssembleWithCanu/f8d2b9d9-a770-45ef-a9b5-6b2a1d0b456f/call-Canu/Canu/61aef15e-aa55-498e-9e6f-e09331f445da/call-Correct/rc
2023/08/12 03:56:39 Delocalizing output /cromwell_root/stdout -> gs://fc-88614ae6-5245-4a6e-ab14-5c3fc9d007a2/submissions/87ee6a97-1be5-4429-9557-817c073de5ae/ONTAssembleWithCanu/f8d2b9d9-a770-45ef-a9b5-6b2a1d0b456f/call-Canu/Canu/61aef15e-aa55-498e-9e6f-e09331f445da/call-Correct/stdout
2023/08/12 03:56:40 Delocalizing output /cromwell_root/stderr -> gs://fc-88614ae6-5245-4a6e-ab14-5c3fc9d007a2/submissions/87ee6a97-1be5-4429-9557-817c073de5ae/ONTAssembleWithCanu/f8d2b9d9-a770-45ef-a9b5-6b2a1d0b456f/call-Canu/Canu/61aef15e-aa55-498e-9e6f-e09331f445da/call-Correct/stderr
2023/08/12 03:56:42 Delocalizing output /cromwell_root/canu_correct_output/65209.correctedReads.fasta.gz -> gs://fc-88614ae6-5245-4a6e-ab14-5c3fc9d007a2/submissions/87ee6a97-1be5-4429-9557-817c073de5ae/ONTAssembleWithCanu/f8d2b9d9-a770-45ef-a9b5-6b2a1d0b456f/call-Canu/Canu/61aef15e-aa55-498e-9e6f-e09331f445da/call-Correct/canu_correct_output/65209.correctedReads.fasta.gz
Required file output '/cromwell_root/canu_correct_output/65209.correctedReads.fasta.gz' does not exist.

SHuang-Broad commented 1 year ago

Hi,

Based on the error message, the resource allocated for canu isn't enough for it to run. You can adjust the WDL accordingly when running the pipeline.

That being said, canu is resource hungry and the mouse genome is large. So the assembly could run for weeks for your data (it could be really really expensive). The workflow is really written for the assembly of small genomes. I'd advise planning your analysis strategy accordingly before running this pipeline.

Regards, Steve

nikhil777shingte commented 1 year ago

Hi Steve, thanks for your response. I was actually able to run this successfully with relatively inexpensive cost ( less than 10$ )

I should provide more context.

Sequencing Data I have is coming ONT sequencer with adaptive sampling. Due to this, I have to run few more steps in addition to the this pipeline to select region of interest for which I have the reads. I have forked repository and made changes so that I am able to pass Canu parameter of estimated size given my adaptive sampling reads.

Earlier, mouse genome size used by Canu was incorrect in my case since my data is from adaptive sampling.

You can find more details here :

https://github.com/nikhil777shingte/long-read-pipelines/tree/test-long-read-canu-assembly

I still have changes made for the Canu resources here [ when it was using mouse genome size ] but with workflow changes I have done, dont think Canu will be resource intensive and able to finish the pipeline within couple of hours.

Link to dockstore published workflow : https://dockstore.org/workflows/github.com/nikhil777shingte/long-read-pipelines/ONTAssembleWithCanuAdaptiveSampling:test-long-read-canu-assembly

Terra details :

ONTAssembleWithCanuAdaptiveSampling ID: 9aadab80-9ca6-4b89-b29b-459295d9097a

workspace-id: 88614ae6-5245-4a6e-ab14-5c3fc9d007a2 submission-id: 4c630f67-bdbe-4521-b415-4205c5828429

I am not sure if you have adaptive sampling support already in current pipeline or have that in your backlog, but would be good to hear your thoughts.

Thanks Nikhil

broadinstitute / long-read-pipelines

Long read pipeline Canu issue with Mouse Genome reference #416