TrinityCTAT / ctat-mutations

Mutation detection using GATK4 best practices and latest RNA editing filters resources. Works with both Hg38 and Hg19
https://github.com/TrinityCTAT/ctat-mutations
Other
71 stars 19 forks source link

File "/opt/conda/lib/python3.7/subprocess.py", line 347, in check_call raise CalledProcessError(retcode, cmd) #106

Open yuanqingyan opened 2 years ago

yuanqingyan commented 2 years ago

Hi,

I use singularity (version 3.8.1), ctat_mutations.v3.2.0.simg and the testing read files to call the mutation. The pipeline fails in "Annotating VCF: Calculating ED" step. Do you have any idea how it can be fixed?

The command I used is as below:

singularity exec -e -B pwd:/data -B /work/reference/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir:/ctat_genome_lib_dir:ro /work/singularityImage/ctat_mutations.v3.2.0.simg /usr/local/src/ctat-mutations/ctat_mutations --left /data/fastq/reads_1.fastq.gz --right /data/fastq/reads_2.fastq.gz --sample_id test --output /data/out --cpu 1 --genome_lib_dir /ctat_genome_lib_dir

The error message is:

echo "########### Annotate BLAT ED #############"

/usr/local/src/ctat-mutations/src/annotate_ED.py \ --input_vcf /data/testOut/cromwell-executions/ctat_mutations/ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb/call-AnnotateVariants/annotate_variants_wf/1561a942-9e30-4a55-a098-9e091e8fbdfd/call-annotate_blat_ED/inputs/-1226801460/test.splice_distance.vcf.gz \ --output_vcf test.blat_ED.vcf \ --reference /data/testOut/cromwell-executions/ctat_mutations/ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb/call-AnnotateVariants/annotate_variants_wf/1561a942-9e30-4a55-a098-9e091e8fbdfd/call-annotate_blat_ED/inputs/817306935/ref_genome.fa \ --temp_dir $TMPDIR \ --threads 16

bgzip -c test.blat_ED.vcf > test.blat_ED.vcf.gz
tabix test.blat_ED.vcf.gz

[2022-01-13 04:39:05,29] [info] BackgroundConfigAsyncJobExecutionActor [1561a942annotate_variants_wf.annotate_blat_ED:NA:1]: executing: /usr/bin/env bash /data/testOut/cromwell-executions/ctat_mutations/ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb/call-AnnotateVariants/annotate_variants_wf/1561a942-9e30-4a55-a098-9e091e8fbdfd/call-annotate_blat_ED/execution/script [2022-01-13 04:39:06,67] [info] BackgroundConfigAsyncJobExecutionActor [1561a942annotate_variants_wf.annotate_blat_ED:NA:1]: job id: 148068 [2022-01-13 04:39:06,68] [info] BackgroundConfigAsyncJobExecutionActor [1561a942annotate_variants_wf.annotate_blat_ED:NA:1]: Status change from - to Done [2022-01-13 04:39:23,67] [info] WorkflowManagerActor Workflow ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb failed (during ExecutingWorkflowState): Job annotate_variants_wf.annotate_blat_ED:NA:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details. Check the content of stderr for potential additional information: /data/testOut/cromwell-executions/ctat_mutations/ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb/call-AnnotateVariants/annotate_variants_wf/1561a942-9e30-4a55-a098-9e091e8fbdfd/call-annotate_blat_ED/execution/stderr. [First 3000 bytes]:+ echo '########### Annotate BLAT ED #############'

22:39:05 : INFO : Processing VCF Positions 22:39:06 : INFO : Running samtools faidx Traceback (most recent call last): File "/usr/local/src/ctat-mutations/src/annotate_ED.py", line 165, in main() File "/usr/local/src/ctat-mutations/src/annotate_ED.py", line 84, in main subprocess.check_call(cmd, shell=True) File "/opt/conda/lib/python3.7/subprocess.py", line 347, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'samtools faidx /data/testOut/cromwell-executions/ctat_mutations/ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb/call-AnnotateVariants/annotate_variants_wf/1561a942-9e30-4a55-a098-9e091e8fbdfd/call-annotate_blat_ED/inputs/817306935/ref_genome.fa --region-file /data/testOut/cromwell-executions/ctat_mutations/ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb/call-AnnotateVariants/annotate_variants_wf/1561a942-9e30-4a55-a098-9e091e8fbdfd/call-annotate_blat_ED/tmp.0d731215/positions.fa > /data/testOut/cromwell-executions/ctat_mutations/ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb/call-AnnotateVariants/annotate_variants_wf/1561a942-9e30-4a55-a098-9e091e8fbdfd/call-annotate_blat_ED/tmp.0d731215/faidx_output.fa' returned non-zero exit status 1.

[2022-01-13 04:39:23,68] [info] WorkflowManagerActor WorkflowActor-ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb is in a terminal state: WorkflowFailedState [2022-01-13 04:40:49,23] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'. [2022-01-13 04:40:52,89] [info] SingleWorkflowRunnerActor writing metadata to /tmp/tmpmghzfeby.json [2022-01-13 04:40:52,92] [info] Workflow polling stopped [2022-01-13 04:40:52,93] [info] Shutting down WorkflowStoreActor - Timeout = 5 seconds [2022-01-13 04:40:52,93] [info] 0 workflows released by cromid-69e661a [2022-01-13 04:40:52,93] [info] Aborting all running workflows. [2022-01-13 04:40:52,94] [info] Shutting down WorkflowLogCopyRouter - Timeout = 5 seconds [2022-01-13 04:40:52,94] [info] WorkflowStoreActor stopped [2022-01-13 04:40:52,95] [info] Shutting down JobExecutionTokenDispenser - Timeout = 5 seconds [2022-01-13 04:40:52,95] [info] WorkflowLogCopyRouter stopped [2022-01-13 04:40:52,95] [info] JobExecutionTokenDispenser stopped [2022-01-13 04:40:52,95] [info] Shutting down WorkflowManagerActor - Timeout = 3600 seconds [2022-01-13 04:40:52,95] [info] WorkflowManagerActor All workflows finished [2022-01-13 04:40:52,95] [info] WorkflowManagerActor stopped [2022-01-13 04:40:53,23] [info] Connection pools shut down [2022-01-13 04:40:53,23] [info] Shutting down SubWorkflowStoreActor - Timeout = 1800 seconds [2022-01-13 04:40:53,23] [info] Shutting down JobStoreActor - Timeout = 1800 seconds [2022-01-13 04:40:53,24] [info] Shutting down CallCacheWriteActor - Timeout = 1800 seconds [2022-01-13 04:40:53,24] [info] CallCacheWriteActor Shutting down: 0 queued messages to process [2022-01-13 04:40:53,24] [info] Shutting down ServiceRegistryActor - Timeout = 1800 seconds [2022-01-13 04:40:53,24] [info] Shutting down DockerHashActor - Timeout = 1800 seconds [2022-01-13 04:40:53,27] [info] Shutting down IoProxy - Timeout = 1800 seconds [2022-01-13 04:40:53,27] [info] SubWorkflowStoreActor stopped [2022-01-13 04:40:53,27] [info] JobStoreActor stopped [2022-01-13 04:40:53,27] [info] CallCacheWriteActor stopped [2022-01-13 04:40:53,27] [info] IoProxy stopped [2022-01-13 04:40:53,29] [info] Shutting down connection pool: curAllocated=0 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false [2022-01-13 04:40:53,30] [info] WriteMetadataActor Shutting down: 0 queued messages to process [2022-01-13 04:40:53,32] [info] KvWriteActor Shutting down: 0 queued messages to process [2022-01-13 04:40:53,32] [info] ServiceRegistryActor stopped [2022-01-13 04:40:53,32] [info] Shutting down connection pool: curAllocated=0 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false [2022-01-13 04:40:53,34] [info] DockerHashActor stopped [2022-01-13 04:40:53,32] [info] Shutting down connection pool: curAllocated=0 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false [2022-01-13 04:40:53,32] [info] Shutting down connection pool: curAllocated=0 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false [2022-01-13 04:40:53,38] [info] Database closed [2022-01-13 04:40:53,39] [info] Stream materializer shut down [2022-01-13 04:40:53,40] [info] WDL HTTP import resolver closed Workflow ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb transitioned to state Failed

Thanks so much.

brianjohnhaas commented 2 years ago

hi,

if you go to the directory for the task that failed: /data/testOut/cromwell-executions/ctat_mutations/ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb/call-AnnotateVariants/annotate_variants_wf/1561a942-9e30-4a55-a098-9e091e8fbdfd/call-annotate_blat_ED/

Do you see any stderr or stdout files that have more information about the error?

Another thing you might try is to run the singularity image in 'shell' mode and try running this command directly:

samtools faidx /data/testOut/cromwell-executions/ctat_mutations/ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb/call-AnnotateVariants/annotate_variants_wf/1561a942-9e30-4a55-a098-9e091e8fbdfd/call-annotate_blat_ED/inputs/817306935/ref_genome.fa --region-file /data/testOut/cromwell-executions/ctat_mutations/ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb/call-AnnotateVariants/annotate_variants_wf/1561a942-9e30-4a55-a098-9e091e8fbdfd/call-annotate_blat_ED/tmp.0d731215/positions.fa

/data/testOut/cromwell-executions/ctat_mutations/ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb/call-AnnotateVariants/annotate_variants_wf/1561a942-9e30-4a55-a098-9e091e8fbdfd/call-annotate_blat_ED/tmp.0d731215/faidx_output.fa

and see what the error message is, if any.

That'll help us figure out the next steps.

best,

~brian

On Thu, Jan 13, 2022 at 12:32 AM Yuanqing Yan @.***> wrote:

Hi,

I use singularity (version 3.8.1), ctat_mutations.v3.2.0.simg and the testing read files to call the mutation. The pipeline fails in "Annotating VCF: Calculating ED" step. Do you have any idea how it can be fixed?

The command I used is as below:

singularity exec -e -B pwd:/data -B /work/reference/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir:/ctat_genome_lib_dir:ro /work/singularityImage/ctat_mutations.v3.2.0.simg /usr/local/src/ctat-mutations/ctat_mutations --left /data/fastq/reads_1.fastq.gz --right /data/fastq/reads_2.fastq.gz --sample_id test --output /data/out --cpu 1 --genome_lib_dir /ctat_genome_lib_dir

The error message is:

echo "########### Annotate BLAT ED #############"

/usr/local/src/ctat-mutations/src/annotate_ED.py --input_vcf /data/testOut/cromwell-executions/ctat_mutations/ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb/call-AnnotateVariants/annotate_variants_wf/1561a942-9e30-4a55-a098-9e091e8fbdfd/call-annotate_blat_ED/inputs/-1226801460/test.splice_distance.vcf.gz

--output_vcf test.blat_ED.vcf --reference /data/testOut/cromwell-executions/ctat_mutations/ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb/call-AnnotateVariants/annotate_variants_wf/1561a942-9e30-4a55-a098-9e091e8fbdfd/call-annotate_blat_ED/inputs/817306935/ref_genome.fa

--temp_dir $TMPDIR --threads 16

bgzip -c test.blat_ED.vcf > test.blat_ED.vcf.gz tabix test.blat_ED.vcf.gz

[2022-01-13 04:39:05,29] [info] BackgroundConfigAsyncJobExecutionActor

/usr/bin/env bash /data/testOut/cromwell-executions/ctat_mutations/ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb/call-AnnotateVariants/annotate_variants_wf/1561a942-9e30-4a55-a098-9e091e8fbdfd/call-annotate_blat_ED/execution/script [2022-01-13 04:39:06,67] [info] BackgroundConfigAsyncJobExecutionActor 1561a942annotate_variants_wf.annotate_blat_ED:NA:1: job id: 148068 [2022-01-13 04:39:06,68] [info] BackgroundConfigAsyncJobExecutionActor 1561a942annotate_variants_wf.annotate_blat_ED:NA:1: Status change from - to Done [2022-01-13 04:39:23,67] [info] WorkflowManagerActor Workflow ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb failed (during ExecutingWorkflowState): Job annotate_variants_wf.annotate_blat_ED:NA:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details. Check the content of stderr for potential additional information: /data/testOut/cromwell-executions/ctat_mutations/ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb/call-AnnotateVariants/annotate_variants_wf/1561a942-9e30-4a55-a098-9e091e8fbdfd/call-annotate_blat_ED/execution/stderr. [First 3000 bytes]:+ echo '########### Annotate BLAT ED #############'

  • /usr/local/src/ctat-mutations/src/annotate_ED.py --input_vcf /data/testOut/cromwell-executions/ctat_mutations/ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb/call-AnnotateVariants/annotate_variants_wf/1561a942-9e30-4a55-a098-9e091e8fbdfd/call-annotate_blat_ED/inputs/-1226801460/test.splice_distance.vcf.gz --output_vcf test.blat_ED.vcf --reference /data/testOut/cromwell-executions/ctat_mutations/ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb/call-AnnotateVariants/annotate_variants_wf/1561a942-9e30-4a55-a098-9e091e8fbdfd/call-annotate_blat_ED/inputs/817306935/ref_genome.fa --temp_dir /data/testOut/cromwell-executions/ctat_mutations/ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb/call-AnnotateVariants/annotate_variants_wf/1561a942-9e30-4a55-a098-9e091e8fbdfd/call-annotate_blat_ED/tmp.0d731215 --threads 16 22:39:05 : INFO : ################################ Annotating VCF: Calculating ED ################################

22:39:05 : INFO : Processing VCF Positions 22:39:06 : INFO : Running samtools faidx Traceback (most recent call last): File "/usr/local/src/ctat-mutations/src/annotate_ED.py", line 165, in main() File "/usr/local/src/ctat-mutations/src/annotate_ED.py", line 84, in main subprocess.check_call(cmd, shell=True) File "/opt/conda/lib/python3.7/subprocess.py", line 347, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'samtools faidx /data/testOut/cromwell-executions/ctat_mutations/ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb/call-AnnotateVariants/annotate_variants_wf/1561a942-9e30-4a55-a098-9e091e8fbdfd/call-annotate_blat_ED/inputs/817306935/ref_genome.fa --region-file /data/testOut/cromwell-executions/ctat_mutations/ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb/call-AnnotateVariants/annotate_variants_wf/1561a942-9e30-4a55-a098-9e091e8fbdfd/call-annotate_blat_ED/tmp.0d731215/positions.fa

/data/testOut/cromwell-executions/ctat_mutations/ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb/call-AnnotateVariants/annotate_variants_wf/1561a942-9e30-4a55-a098-9e091e8fbdfd/call-annotate_blat_ED/tmp.0d731215/faidx_output.fa' returned non-zero exit status 1.

[2022-01-13 04:39:23,68] [info] WorkflowManagerActor WorkflowActor-ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb is in a terminal state: WorkflowFailedState [2022-01-13 04:40:49,23] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'. [2022-01-13 04:40:52,89] [info] SingleWorkflowRunnerActor writing metadata to /tmp/tmpmghzfeby.json [2022-01-13 04:40:52,92] [info] Workflow polling stopped [2022-01-13 04:40:52,93] [info] Shutting down WorkflowStoreActor - Timeout = 5 seconds [2022-01-13 04:40:52,93] [info] 0 workflows released by cromid-69e661a [2022-01-13 04:40:52,93] [info] Aborting all running workflows. [2022-01-13 04:40:52,94] [info] Shutting down WorkflowLogCopyRouter - Timeout = 5 seconds [2022-01-13 04:40:52,94] [info] WorkflowStoreActor stopped [2022-01-13 04:40:52,95] [info] Shutting down JobExecutionTokenDispenser - Timeout = 5 seconds [2022-01-13 04:40:52,95] [info] WorkflowLogCopyRouter stopped [2022-01-13 04:40:52,95] [info] JobExecutionTokenDispenser stopped [2022-01-13 04:40:52,95] [info] Shutting down WorkflowManagerActor - Timeout = 3600 seconds [2022-01-13 04:40:52,95] [info] WorkflowManagerActor All workflows finished [2022-01-13 04:40:52,95] [info] WorkflowManagerActor stopped [2022-01-13 04:40:53,23] [info] Connection pools shut down [2022-01-13 04:40:53,23] [info] Shutting down SubWorkflowStoreActor - Timeout = 1800 seconds [2022-01-13 04:40:53,23] [info] Shutting down JobStoreActor - Timeout = 1800 seconds [2022-01-13 04:40:53,24] [info] Shutting down CallCacheWriteActor - Timeout = 1800 seconds [2022-01-13 04:40:53,24] [info] CallCacheWriteActor Shutting down: 0 queued messages to process [2022-01-13 04:40:53,24] [info] Shutting down ServiceRegistryActor - Timeout = 1800 seconds [2022-01-13 04:40:53,24] [info] Shutting down DockerHashActor - Timeout = 1800 seconds [2022-01-13 04:40:53,27] [info] Shutting down IoProxy - Timeout = 1800 seconds [2022-01-13 04:40:53,27] [info] SubWorkflowStoreActor stopped [2022-01-13 04:40:53,27] [info] JobStoreActor stopped [2022-01-13 04:40:53,27] [info] CallCacheWriteActor stopped [2022-01-13 04:40:53,27] [info] IoProxy stopped [2022-01-13 04:40:53,29] [info] Shutting down connection pool: curAllocated=0 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false [2022-01-13 04:40:53,30] [info] WriteMetadataActor Shutting down: 0 queued messages to process [2022-01-13 04:40:53,32] [info] KvWriteActor Shutting down: 0 queued messages to process [2022-01-13 04:40:53,32] [info] ServiceRegistryActor stopped [2022-01-13 04:40:53,32] [info] Shutting down connection pool: curAllocated=0 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false [2022-01-13 04:40:53,34] [info] DockerHashActor stopped [2022-01-13 04:40:53,32] [info] Shutting down connection pool: curAllocated=0 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false [2022-01-13 04:40:53,32] [info] Shutting down connection pool: curAllocated=0 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false [2022-01-13 04:40:53,38] [info] Database closed [2022-01-13 04:40:53,39] [info] Stream materializer shut down [2022-01-13 04:40:53,40] [info] WDL HTTP import resolver closed Workflow ffd9aa99-405b-4bc0-bf96-4e61a4f09dbb transitioned to state Failed

Thanks so much.

— Reply to this email directly, view it on GitHub https://github.com/NCIP/ctat-mutations/issues/106, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX57HJHHEBAPYEYPFYTUVZPVHANCNFSM5L23ZNCQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

yuanqingyan commented 2 years ago

Thanks for your information, Brian.

I fixed my previous issue already by feeding the correct input fastq files. However, another issue comes to the "VariantFiltration" module. Please see the message of "[2022-01-13 20:20:41,90] ", "[2022-01-13 20:20:59,58]" and "[2022-01-13 20:21:44,54]". Thanks again.


[2022-01-13 20:20:41,16] [info] WorkflowExecutionActor-9068f826-411f-49a0-90cb-ce09f29efed4 [9068f826]: Starting ctat_mutations.VariantFiltration [2022-01-13 20:20:41,89] [info] Assigned new job execution tokens to the following groups: 9068f826: 1 [2022-01-13 20:20:41,90] [info] 9068f826-411f-49a0-90cb-ce09f29efed4-EngineJobExecutionActor-ctat_mutations.VariantFiltration:NA:1 [9068f826]: Could not copy a suitable cache hit for 9068f826:ctat_mutations.VariantFiltration:-1:1. No copy attempts were made. [2022-01-13 20:20:41,90] [warn] BackgroundConfigAsyncJobExecutionActor [9068f826ctat_mutations.VariantFiltration:NA:1]: Unrecognized runtime attribute keys: preemptible, disks, docker, cpu, memory [2022-01-13 20:20:41,91] [warn] Localization via hard link has failed: /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/inputs/817306935/ref_genome.fa.fai -> /ctat_genome_lib_dir/ref_genome.fa.fai: Invalid cross-device link [2022-01-13 20:20:41,91] [warn] Localization via hard link has failed: /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/inputs/817306935/ref_genome.fa -> /ctat_genome_lib_dir/ref_genome.fa: Invalid cross-device link [2022-01-13 20:20:41,91] [warn] Localization via hard link has failed: /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/inputs/817306935/ref_genome.dict -> /ctat_genome_lib_dir/ref_genome.dict: Invalid cross-device link [2022-01-13 20:20:41,92] [info] BackgroundConfigAsyncJobExecutionActor [9068f826ctat_mutations.VariantFiltration:NA:1]: set -ex

monitor_script.sh &

boosting_method="XGBoost"

if [ "$boosting_method" == "none" ]; then

gatk --java-options "-Xmx2500m" \
VariantFiltration \
--R /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/inputs/817306935/ref_genome.fa \
--V /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/inputs/328445994/test.vcf.gz \
--window 35 \
--cluster 3 \
--filter-name "FS" \
--filter "FS > 30.0" \
--filter-name "QD" \
--filter "QD < 2.0" \
--filter-name "SPLICEDIST" \
--filter "DJ < 3" \
-O tmp.vcf

gatk --java-options "-Xmx2500m" \
SelectVariants \
--R /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/inputs/817306935/ref_genome.fa \
--V tmp.vcf \
-select-type SNP \
--exclude-filtered \
-O test.XGBoost-classifier.vcf.gz

else

##############
## snps first:
/usr/local/src/ctat-mutations/src/annotated_vcf_to_feature_matrix.py \
    --vcf /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/inputs/328445994/test.vcf.gz \
    --features AC,ALT,BaseQRankSum,DJ,DP,ED,Entropy,ExcessHet,FS,Homopolymer,LEN,MLEAF,MMF,QUAL,REF,RPT,RS,ReadPosRankSum,SAO,SOR,TCR,TDM,VAF,VMMF \
    --snps \
     \
    --output XGBoost.snps.feature_matrix

/usr/local/src/ctat-mutations/src/VariantBoosting/Apply_ML.py \
    --feature_matrix XGBoost.snps.feature_matrix \
    --snps \
    --features AC,ALT,BaseQRankSum,DJ,DP,ED,Entropy,ExcessHet,FS,Homopolymer,LEN,MLEAF,MMF,QUAL,REF,RPT,RS,ReadPosRankSum,SAO,SOR,TCR,TDM,VAF,VMMF \
    --predictor classifier \
    --model XGBoost \
    --output XGBoost.classifier.snps.feature_matrix.wPreds

##############
## indels next
/usr/local/src/ctat-mutations/src/annotated_vcf_to_feature_matrix.py \
    --vcf /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/inputs/328445994/test.vcf.gz \
    --features AC,ALT,BaseQRankSum,DJ,DP,ED,Entropy,ExcessHet,FS,Homopolymer,LEN,MLEAF,MMF,QUAL,REF,RPT,RS,ReadPosRankSum,SAO,SOR,TCR,TDM,VAF,VMMF \
    --indels \
     \
    --output XGBoost.indels.feature_matrix

/usr/local/src/ctat-mutations/src/VariantBoosting/Apply_ML.py \
    --feature_matrix XGBoost.indels.feature_matrix \
    --indels \
    --features AC,ALT,BaseQRankSum,DJ,DP,ED,Entropy,ExcessHet,FS,Homopolymer,LEN,MLEAF,MMF,QUAL,REF,RPT,RS,ReadPosRankSum,SAO,SOR,TCR,TDM,VAF,VMMF \
    --predictor classifier \
    --model XGBoost \
    --output XGBoost.classifier.indels.feature_matrix.wPreds

 #########
 ## combine predictions into single output vcf

 /usr/local/src/ctat-mutations/src/extract_boosted_vcf.py \
     --vcf_in /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/inputs/328445994/test.vcf.gz \
     --boosted_variants_matrix XGBoost.classifier.snps.feature_matrix.wPreds XGBoost.classifier.indels.feature_matrix.wPreds\
     --vcf_out XGBoost.classifier.vcf

 bgzip -c XGBoost.classifier.vcf > test.XGBoost-classifier.vcf.gz

fi [2022-01-13 20:20:41,93] [info] BackgroundConfigAsyncJobExecutionActor [9068f826ctat_mutations.VariantFiltration:NA:1]: executing: /usr/bin/env bash /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/execution/script [2022-01-13 20:20:43,80] [info] BackgroundConfigAsyncJobExecutionActor [9068f826ctat_mutations.VariantFiltration:NA:1]: job id: 441113 [2022-01-13 20:20:43,80] [info] BackgroundConfigAsyncJobExecutionActor [9068f826ctat_mutations.VariantFiltration:NA:1]: Status change from - to WaitingForReturnCode [2022-01-13 20:20:58,15] [info] BackgroundConfigAsyncJobExecutionActor [9068f826ctat_mutations.VariantFiltration:NA:1]: Status change from WaitingForReturnCode to Done [2022-01-13 20:20:59,58] [info] WorkflowManagerActor Workflow 9068f826-411f-49a0-90cb-ce09f29efed4 failed (during ExecutingWorkflowState): Job ctat_mutations.VariantFiltration:NA:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details. Check the content of stderr for potential additional information: /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/execution/stderr. [First 3000 bytes]:+ boosting_method=XGBoost

brianjohnhaas commented 2 years ago

hi,

it looks like there's too few variants here to do boosting on. If you're running a tiny test data set through, this can happen. To turn off boosting, use the parameter --boosting_method none

it should resume where it left off when you give it the additional option.

best,

~b

On Thu, Jan 13, 2022 at 3:56 PM Yuanqing Yan @.***> wrote:

Thanks for your information, Brian.

I fixed my previous issue already by feeding the correct input fastq files. However, another issue comes to the "VariantFiltration" module. Please see the message of "[2022-01-13 20:20:41,90] ", "[2022-01-13 20:20:59,58]" and "[2022-01-13 20:21:44,54]". Thanks again.

[2022-01-13 20:20:41,16] [info] WorkflowExecutionActor-9068f826-411f-49a0-90cb-ce09f29efed4 [9068f826]: Starting ctat_mutations.VariantFiltration [2022-01-13 20:20:41,89] [info] Assigned new job execution tokens to the following groups: 9068f826: 1 [2022-01-13 20:20:41,90] [info] 9068f826-411f-49a0-90cb-ce09f29efed4-EngineJobExecutionActor-ctat_mutations.VariantFiltration:NA:1 [9068f826]: Could not copy a suitable cache hit for 9068f826:ctat_mutations.VariantFiltration👎1. No copy attempts were made. [2022-01-13 20:20:41,90] [warn] BackgroundConfigAsyncJobExecutionActor [9068f826ctat_mutations.VariantFiltration:NA:1]: Unrecognized runtime attribute keys: preemptible, disks, docker, cpu, memory [2022-01-13 20:20:41,91] [warn] Localization via hard link has failed: /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/inputs/817306935/ref_genome.fa.fai -> /ctat_genome_lib_dir/ref_genome.fa.fai: Invalid cross-device link [2022-01-13 20:20:41,91] [warn] Localization via hard link has failed: /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/inputs/817306935/ref_genome.fa -> /ctat_genome_lib_dir/ref_genome.fa: Invalid cross-device link [2022-01-13 20:20:41,91] [warn] Localization via hard link has failed: /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/inputs/817306935/ref_genome.dict -> /ctat_genome_lib_dir/ref_genome.dict: Invalid cross-device link [2022-01-13 20:20:41,92] [info] BackgroundConfigAsyncJobExecutionActor [9068f826ctat_mutations.VariantFiltration:NA:1]: set -ex monitor_script.sh &

boosting_method="XGBoost"

if [ "$boosting_method" == "none" ]; then

gatk --java-options "-Xmx2500m" \

VariantFiltration \

--R /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/inputs/817306935/ref_genome.fa \

--V /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/inputs/328445994/test.vcf.gz \

--window 35 \

--cluster 3 \

--filter-name "FS" \

--filter "FS > 30.0" \

--filter-name "QD" \

--filter "QD < 2.0" \

--filter-name "SPLICEDIST" \

--filter "DJ < 3" \

-O tmp.vcf

gatk --java-options "-Xmx2500m" \

SelectVariants \

--R /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/inputs/817306935/ref_genome.fa \

--V tmp.vcf \

-select-type SNP \

--exclude-filtered \

-O test.XGBoost-classifier.vcf.gz

else

##############

snps first:

/usr/local/src/ctat-mutations/src/annotated_vcf_to_feature_matrix.py \

--vcf /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/inputs/328445994/test.vcf.gz \

--features AC,ALT,BaseQRankSum,DJ,DP,ED,Entropy,ExcessHet,FS,Homopolymer,LEN,MLEAF,MMF,QUAL,REF,RPT,RS,ReadPosRankSum,SAO,SOR,TCR,TDM,VAF,VMMF \

--snps \

 \

--output XGBoost.snps.feature_matrix

/usr/local/src/ctat-mutations/src/VariantBoosting/Apply_ML.py \

--feature_matrix XGBoost.snps.feature_matrix \

--snps \

--features AC,ALT,BaseQRankSum,DJ,DP,ED,Entropy,ExcessHet,FS,Homopolymer,LEN,MLEAF,MMF,QUAL,REF,RPT,RS,ReadPosRankSum,SAO,SOR,TCR,TDM,VAF,VMMF \

--predictor classifier \

--model XGBoost \

--output XGBoost.classifier.snps.feature_matrix.wPreds

##############

indels next

/usr/local/src/ctat-mutations/src/annotated_vcf_to_feature_matrix.py \

--vcf /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/inputs/328445994/test.vcf.gz \

--features AC,ALT,BaseQRankSum,DJ,DP,ED,Entropy,ExcessHet,FS,Homopolymer,LEN,MLEAF,MMF,QUAL,REF,RPT,RS,ReadPosRankSum,SAO,SOR,TCR,TDM,VAF,VMMF \

--indels \

 \

--output XGBoost.indels.feature_matrix

/usr/local/src/ctat-mutations/src/VariantBoosting/Apply_ML.py \

--feature_matrix XGBoost.indels.feature_matrix \

--indels \

--features AC,ALT,BaseQRankSum,DJ,DP,ED,Entropy,ExcessHet,FS,Homopolymer,LEN,MLEAF,MMF,QUAL,REF,RPT,RS,ReadPosRankSum,SAO,SOR,TCR,TDM,VAF,VMMF \

--predictor classifier \

--model XGBoost \

--output XGBoost.classifier.indels.feature_matrix.wPreds

#########

combine predictions into single output vcf

/usr/local/src/ctat-mutations/src/extract_boosted_vcf.py \

 --vcf_in /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/inputs/328445994/test.vcf.gz \

 --boosted_variants_matrix XGBoost.classifier.snps.feature_matrix.wPreds XGBoost.classifier.indels.feature_matrix.wPreds\

 --vcf_out XGBoost.classifier.vcf

bgzip -c XGBoost.classifier.vcf > test.XGBoost-classifier.vcf.gz

fi [2022-01-13 20:20:41,93] [info] BackgroundConfigAsyncJobExecutionActor [9068f826ctat_mutations.VariantFiltration:NA:1]: executing: /usr/bin/env bash /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/execution/script [2022-01-13 20:20:43,80] [info] BackgroundConfigAsyncJobExecutionActor [9068f826ctat_mutations.VariantFiltration:NA:1]: job id: 441113 [2022-01-13 20:20:43,80] [info] BackgroundConfigAsyncJobExecutionActor [9068f826ctat_mutations.VariantFiltration:NA:1]: Status change from - to WaitingForReturnCode [2022-01-13 20:20:58,15] [info] BackgroundConfigAsyncJobExecutionActor [9068f826ctat_mutations.VariantFiltration:NA:1]: Status change from WaitingForReturnCode to Done [2022-01-13 20:20:59,58] [info] WorkflowManagerActor Workflow 9068f826-411f-49a0-90cb-ce09f29efed4 failed (during ExecutingWorkflowState): Job ctat_mutations.VariantFiltration:NA:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details. Check the content of stderr for potential additional information: /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/execution/stderr. [First 3000 bytes]:+ boosting_method=XGBoost

  • '[' XGBoost == none ']'
  • /usr/local/src/ctat-mutations/src/annotated_vcf_to_feature_matrix.py --vcf /data/out/cromwell-executions/ctat_mutations/9068f826-411f-49a0-90cb-ce09f29efed4/call-VariantFiltration/inputs/328445994/test.vcf.gz --features AC,ALT,BaseQRankSum,DJ,DP,ED,Entropy,ExcessHet,FS,Homopolymer,LEN,MLEAF,MMF,QUAL,REF,RPT,RS,ReadPosRankSum,SAO,SOR,TCR,TDM,VAF,VMMF --snps --output XGBoost.snps.feature_matrix 14:20:43 : INFO : -restricting feature matrix to SNPS 14:20:43 : INFO : Loading input VCF ... 14:20:43 : INFO : Preprocess Data ... 14:20:43 : INFO : Number of variants loaded: 40 14:20:43 : INFO : -restricting to snps 14:20:43 : INFO : -number of variants now: 39 14:20:43 : INFO : -writing feature data matrix to: XGBoost.snps.feature_matrix
  • /usr/local/src/ctat-mutations/src/VariantBoosting/Apply_ML.py --feature_matrix XGBoost.snps.feature_matrix --snps --features AC,ALT,BaseQRankSum,DJ,DP,ED,Entropy,ExcessHet,FS,Homopolymer,LEN,MLEAF,MMF,QUAL,REF,RPT,RS,ReadPosRankSum,SAO,SOR,TCR,TDM,VAF,VMMF --predictor classifier --model XGBoost --output XGBoost.classifier.snps.feature_matrix.wPreds 14:20:44 : INFO : ########################## Running CTAT Boosting ########################## 14:20:44 : INFO : Preprocess Data ... 14:20:44 : INFO : Removing RNAediting sites ... 14:20:44 : INFO : Number of variants after removing RNAediting sites: 37 14:20:44 : INFO : -examining AC 14:20:44 : INFO : -AC has 2 uniq entries 14:20:44 : INFO : -examining ALT 14:20:44 : INFO : -ALT has 3 uniq entries 14:20:44 : INFO : -examining BaseQRankSum 14:20:44 : INFO : -BaseQRankSum has 24 uniq entries 14:20:44 : INFO : -examining DJ 14:20:44 : INFO : -DJ has 36 uniq entries 14:20:44 : INFO : -examining DP 14:20:44 : INFO : -DP has 30 uniq entries 14:20:44 : INFO : -examining ED 14:20:44 : INFO : -ED has 4 uniq entries 14:20:44 : INFO : -examining Entropy 14:20:44 : INFO : -Entropy has 9 uniq entries 14:20:44 : INFO : -examining ExcessHet 14:20:44 : INFO : -ExcessHet has 1 uniq entries 14:20:44 : INFO : -pruning feature column ExcessHet as theres no complexity 14:20:44 : INFO : -examining FS 14:20:44 : INFO : -FS has 16 uniq entries 14:20:44 : INFO : -examining Homopolymer 14:20:44 : INFO : -Homopolymer has 2 uniq entries 14:20:44 : INFO : -examining LEN 14:20:44 : INFO : -LEN has 1 uniq entries 14:20:44 : INFO : -pruning feature column LEN as theres no complexity 14:20:44 : INFO : -examining MLEAF 14:20:44 : INFO : -MLEAF has 2 uniq entries 14:20:44 : INFO : -examining MMF 14:20:44 : INFO : -MMF has 9 uniq entries 14:20:44 : INFO : -examining QUAL 14:20:44 : INFO : -QUAL has 34 uniq entries 14:20:44 : INFO : -examining REF 14:20:44 : INFO : -REF has 3 uniq entries 14:20:44 : INFO : -examining RPT 14:20:44 : INFO : -RPT has 2 uniq entries 14:20:44 : INFO : -examining RS 14:20:44 : INFO : -RS has 2 uniq entries 14:20:44 : INFO : -examining ReadPosRankSum 14:20:44 : INFO : -ReadPosRankSum has 24 uniq entries 1 [2022-01-13 20:20:59,58] [info] WorkflowManagerActor WorkflowActor-9068f826-411f-49a0-90cb-ce09f29efed4 is in a terminal state: WorkflowFailedState [2022-01-13 20:21:44,54] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'. [2022-01-13 20:21:49,36] [info] SingleWorkflowRunnerActor writing metadata to /tmp/tmphx1uw4l9.json [2022-01-13 20:21:49,40] [info] Workflow polling stopped [2022-01-13 20:21:49,42] [info] Shutting down WorkflowStoreActor - Timeout = 5 seconds [2022-01-13 20:21:49,42] [info] Shutting down WorkflowLogCopyRouter - Timeout = 5 seconds [2022-01-13 20:21:49,42] [info] Shutting down JobExecutionTokenDispenser - Timeout = 5 seconds [2022-01-13 20:21:49,42] [info] 0 workflows released by cromid-f9579d5 [2022-01-13 20:21:49,42] [info] Aborting all running workflows. [2022-01-13 20:21:49,43] [info] JobExecutionTokenDispenser stopped [2022-01-13 20:21:49,43] [info] WorkflowStoreActor stopped [2022-01-13 20:21:49,43] [info] WorkflowLogCopyRouter stopped [2022-01-13 20:21:49,43] [info] Shutting down WorkflowManagerActor - Timeout = 3600 seconds [2022-01-13 20:21:49,43] [info] WorkflowManagerActor All workflows finished [2022-01-13 20:21:49,43] [info] WorkflowManagerActor stopped [2022-01-13 20:21:49,66] [info] Connection pools shut down [2022-01-13 20:21:49,66] [info] Shutting down SubWorkflowStoreActor - Timeout = 1800 seconds [2022-01-13 20:21:49,66] [info] Shutting down JobStoreActor - Timeout = 1800 seconds [2022-01-13 20:21:49,66] [info] Shutting down CallCacheWriteActor - Timeout = 1800 seconds [2022-01-13 20:21:49,66] [info] Shutting down ServiceRegistryActor - Timeout = 1800 seconds [2022-01-13 20:21:49,66] [info] Shutting down DockerHashActor - Timeout = 1800 seconds [2022-01-13 20:21:49,66] [info] Shutting down IoProxy - Timeout = 1800 seconds [2022-01-13 20:21:49,66] [info] SubWorkflowStoreActor stopped [2022-01-13 20:21:49,66] [info] CallCacheWriteActor Shutting down: 0 queued messages to process [2022-01-13 20:21:49,66] [info] JobStoreActor stopped [2022-01-13 20:21:49,66] [info] CallCacheWriteActor stopped [2022-01-13 20:21:49,66] [info] KvWriteActor Shutting down: 0 queued messages to process [2022-01-13 20:21:49,66] [info] IoProxy stopped [2022-01-13 20:21:49,66] [info] WriteMetadataActor Shutting down: 0 queued messages to process [2022-01-13 20:21:49,66] [info] ServiceRegistryActor stopped [2022-01-13 20:21:49,67] [info] DockerHashActor stopped [2022-01-13 20:21:49,71] [info] Database closed [2022-01-13 20:21:49,71] [info] Stream materializer shut down [2022-01-13 20:21:49,72] [info] WDL HTTP import resolver closed

— Reply to this email directly, view it on GitHub https://github.com/NCIP/ctat-mutations/issues/106#issuecomment-1012509050, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX4FZHSYJHAJWD5UT63UV44BLANCNFSM5L23ZNCQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

yuanqingyan commented 2 years ago

Thanks so much.