TrinityCTAT / ctat-mutations

Mutation detection using GATK4 best practices and latest RNA editing filters resources. Works with both Hg38 and Hg19
https://github.com/TrinityCTAT/ctat-mutations
Other
71 stars 19 forks source link

After "WaitingForReturnCode ", pipeline stuck #105

Open jaewon-cho opened 2 years ago

jaewon-cho commented 2 years ago

I was running the ctat-mutation pipeline with singularity. I have done the "test" data before

Here is my script: singularity exec -e -B ${CTAT_GENOME_LIB}:/data/hemberg/jaewon/ctat/new5/GRCh37_gencode_v19_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/ ~/ctat_mutations.v3.2.0.simg ~/ctat-mutations-CTAT-Mutations-v3.2.0/ctat_mutations --left test_data/SRR11619681_1.fastq --right test_data/SRR11619681_2.fastq --genome_lib_dir ~/sfa_jaewon/ctat/new5/GRCh37_gencode_v19_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/ -O peto --cpu 6 --sample_id peto --boosting_method=none

The problem is

[2022-01-11 20:28:44,46] [info] BackgroundConfigAsyncJobExecutionActor [2aat_mutations.SplitIntervals:NA:1]: executing: /usr/bin/env bash /PHShome/jst_data/peto/cromwell-executions/ctat_mutations/2a83efc2-5d77-4cde-a0fd-38bb/call-SplitIntervals/execution/script [2022-01-11 20:28:47,62] [info] BackgroundConfigAsyncJobExecutionActor [2aat_mutations.StarAlign:NA:1]: job id: 16935 [2022-01-11 20:28:47,63] [info] BackgroundConfigAsyncJobExecutionActor [2aat_mutations.SplitIntervals:NA:1]: job id: 16952 [2022-01-11 20:28:47,65] [info] BackgroundConfigAsyncJobExecutionActor [2aat_mutations.SplitIntervals:NA:1]: Status change from - to WaitingForRetur [2022-01-11 20:28:47,65] [info] BackgroundConfigAsyncJobExecutionActor [2aat_mutations.StarAlign:NA:1]: Status change from - to WaitingForReturnCode

after showing this status, then the pipeline was stuck I found that from "stdout" in "call-StarAlign/execution"

Jan 11 15:28:44 ..... started STAR run Jan 11 15:28:44 ..... loading genome Jan 11 15:32:26 ..... started 1st pass mapping Jan 11 15:35:28 ..... finished 1st pass mapping Jan 11 15:35:32 ..... inserting junctions into the genome indices Jan 11 15:44:03 ..... started mapping Jan 11 15:47:17 ..... finished mapping Jan 11 15:47:47 ..... started sorting BAM Jan 11 15:47:49 ..... finished successfully

I found from my previous result that after Star Alignment, there should be "call-MarkDuplicates", but currently, there is no "call-MarkDuplicates".

Thank you

brianjohnhaas commented 2 years ago

hi,

If you rerun the original command, it should resume where it left off. The workflow runner (cromwell) sometimes takes a little while to go to the next step but it usually doesn't get completely stuck. It's an industry-strength workflow runner.

best,

~b

On Tue, Jan 11, 2022 at 4:19 PM jaewon-cho @.***> wrote:

I was running the ctat-mutation pipeline with singularity. I have done the "test" data before

Here is my script: singularity exec -e -B ${CTAT_GENOME_LIB}:/data/hemberg/jaewon/ctat/new5/GRCh37_gencode_v19_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/ ~/ctat_mutations.v3.2.0.simg ~/ctat-mutations-CTAT-Mutations-v3.2.0/ctat_mutations --left test_data/SRR11619681_1.fastq --right test_data/SRR11619681_2.fastq --genome_lib_dir ~/sfa_jaewon/ctat/new5/GRCh37_gencode_v19_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/ -O peto --cpu 6 --sample_id peto --boosting_method=none

The problem is

[2022-01-11 20:28:44,46] [info] BackgroundConfigAsyncJobExecutionActor [2aat_mutations.SplitIntervals:NA:1]: executing: /usr/bin/env bash /PHShome/jst_data/peto/cromwell-executions/ctat_mutations/2a83efc2-5d77-4cde-a0fd-38bb/call-SplitIntervals/execution/script [2022-01-11 20:28:47,62] [info] BackgroundConfigAsyncJobExecutionActor [2aat_mutations.StarAlign:NA:1]: job id: 16935 [2022-01-11 20:28:47,63] [info] BackgroundConfigAsyncJobExecutionActor [2aat_mutations.SplitIntervals:NA:1]: job id: 16952 [2022-01-11 20:28:47,65] [info] BackgroundConfigAsyncJobExecutionActor [2aat_mutations.SplitIntervals:NA:1]: Status change from - to WaitingForRetur [2022-01-11 20:28:47,65] [info] BackgroundConfigAsyncJobExecutionActor [2aat_mutations.StarAlign:NA:1]: Status change from - to WaitingForReturnCode

after showing this status, then the pipeline was stuck I found that from "stdout" in "call-StarAlign/execution"

Jan 11 15:28:44 ..... started STAR run Jan 11 15:28:44 ..... loading genome Jan 11 15:32:26 ..... started 1st pass mapping Jan 11 15:35:28 ..... finished 1st pass mapping Jan 11 15:35:32 ..... inserting junctions into the genome indices Jan 11 15:44:03 ..... started mapping Jan 11 15:47:17 ..... finished mapping Jan 11 15:47:47 ..... started sorting BAM Jan 11 15:47:49 ..... finished successfully

I found from my previous result that after Star Alignment, there should be "call-MarkDuplicates", but currently, there is no "call-MarkDuplicates".

Thank you

— Reply to this email directly, view it on GitHub https://github.com/NCIP/ctat-mutations/issues/105, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKXYUODW5STZERTH6KMTUVSNGDANCNFSM5LXLEABQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

jaewon-cho commented 2 years ago

Thank you for your comment But, actually, I have to run more than 10K samples. It is impossible to kill and rerun everything. Is there any way not to stick the process? (current job has been stopped the whole night yesterday)

brianjohnhaas commented 2 years ago

hi,

you might be overwhelming the system if you're trying to run a lot of samples locally. For large numbers of samples, you would best use the cloud such as our Terra workflow:

https://app.terra.bio/#workspaces/ctat-firecloud/ctat-mutations

If you do attempt to run them outside the cloud, be sure to have separate working directories for each sample. Also, keep a look on your system resources being used. If your file system gets overwhelmed, or you start going into swap memory, it'll be trouble.

On Wed, Jan 12, 2022 at 1:54 PM jaewon-cho @.***> wrote:

Thank you for your comment But, actually, I have to run more than 10K samples. It is impossible to kill and rerun everything. Is there any way not to stick the process? (current job has been stopped the whole night yesterday)

— Reply to this email directly, view it on GitHub https://github.com/NCIP/ctat-mutations/issues/105#issuecomment-1011353401, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKXYO25JWUYKE6YIMOC3UVXE4XANCNFSM5LXLEABQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

jaewon-cho commented 2 years ago

Thank you for your kind and rapid response. I really appreciate that. Actually, I am planning to use the LSF system not running in the local environment. Before sending the queue, I was trying with just one sample if it still runs appropriately. Thank you again