Without any changes, the workflow finished successfully. However, I need to run the same workflow for a use case of Whole Genome Sequencing (WGS). To achieve this I changed the key "PairedEndSingleSampleWorkflow.flowcell_unmapped_bams" in the input file "PairedEndSingleSampleWf.hg38.inputs.json" to point to the WGS read groups found in the Broad Institute public bucket:
I ran the workflow with the new input files while keeping the options and wdl file without changes. And it failed in the task ValidateSamFile:
2020-12-03 10:22:42,549 cromwell-system-akka.dispatchers.backend-dispatcher-17814 INFO - PipelinesApiAsyncBackendJobExecutionActor [UUID(1abfc19a)PairedEndSingleSampleWorkflow.ValidateCram:NA:3]: Status change from Running to Success
2020-12-03 10:22:45,913 cromwell-system-akka.dispatchers.engine-dispatcher-35 INFO - WorkflowManagerActor Workflow 1abfc19a-905b-4632-b90c-4d1be258bc5b failed (during ExecutingWorkflowState): java.lang.Exception: The compute backend terminated the job. If this termination is unexpected, examine likely causes such as preemption, running out of disk or memory on the compute instance, or exceeding the backend's maximum job duration.
Debugging ValidateSamFile
Changing the pre-emptible attempts to zero, I ran ValidateSamFile alone inside another workflow and got the following error:
ValidateCramWorkflow.ValidateSamFile:NA:1 failed. The job was stopped before the command finished. PAPI error code 10. The assigned worker has failed to complete the operation
Then I changed the task ValidateSamFile to have more memory and do not make attempts on pre-emptible machines and it worked successfully:
I applied the changes to ValidateSamFile task inside the complete workflow, and then repeated the execution process.
The temporal outputs were saved in Cloud Storage, and they were completed after 23 hours 15 minutes. I checked 2 days later after this event and the final output folder did not have the complete final output files and the initial Virtual Machine instance created by Google Cloud Life Sciences with wdl_runner was still running (even the .g.vcf found in the temporal files was not in the final output folder).
I had to kill the workflow because it kept using resources (1% of the CPU of the initial Virtual Machine Instance). Even the log file was not created.
Questions
Is there any way to troubleshoot the problem of having the workflow hanging without finishing?
Do I need to make additional changes to the workflow to run with complete WGS?
Additional remarks
Modified workflow ran successfully on the original inputs in the tutorial.
using the wdl-runner monitoring tool after the 2 days gave the following message:
Transitioning to next stage or copying final output
Thank you for your attention.
Hello Broad Institute team,
I have been following this tutorial to run GATK Best Practices in Google Cloud Platform using Google Cloud Life Sciences:
https://cloud.google.com/life-sciences/docs/tutorials/gatk
This tutorial runs the workflow PairedEndSingleSampleWf.wdl that can be found in:
https://github.com/gatk-workflows/broad-prod-wgs-germline-snps-indels
Without any changes, the workflow finished successfully. However, I need to run the same workflow for a use case of Whole Genome Sequencing (WGS). To achieve this I changed the key "PairedEndSingleSampleWorkflow.flowcell_unmapped_bams" in the input file "PairedEndSingleSampleWf.hg38.inputs.json" to point to the WGS read groups found in the Broad Institute public bucket:
First attempt (Failed)
I ran the workflow with the new input files while keeping the options and wdl file without changes. And it failed in the task ValidateSamFile:
Debugging ValidateSamFile
Changing the pre-emptible attempts to zero, I ran ValidateSamFile alone inside another workflow and got the following error:
ValidateCramWorkflow.ValidateSamFile:NA:1 failed. The job was stopped before the command finished. PAPI error code 10. The assigned worker has failed to complete the operation
Then I changed the task ValidateSamFile to have more memory and do not make attempts on pre-emptible machines and it worked successfully:Second attempt (Cancelled)
I applied the changes to ValidateSamFile task inside the complete workflow, and then repeated the execution process.
The temporal outputs were saved in Cloud Storage, and they were completed after 23 hours 15 minutes. I checked 2 days later after this event and the final output folder did not have the complete final output files and the initial Virtual Machine instance created by Google Cloud Life Sciences with wdl_runner was still running (even the .g.vcf found in the temporal files was not in the final output folder).
I had to kill the workflow because it kept using resources (1% of the CPU of the initial Virtual Machine Instance). Even the log file was not created.
Questions
Additional remarks
Transitioning to next stage or copying final output
Thank you for your attention.Greetings,
Johan