TrinityCTAT / ctat-mutations

Mutation detection using GATK4 best practices and latest RNA editing filters resources. Works with both Hg38 and Hg19
https://github.com/TrinityCTAT/ctat-mutations
Other
71 stars 19 forks source link

Issue with BQSR maximum cycle value #129

Open irzamsarfraz opened 8 months ago

irzamsarfraz commented 8 months ago

Hi,

Thanks for creating this useful pipeline!

I am running the pipeline using a singularity container on some long read sequencing data. It seems to work well with smaller input fastq files but as soon as the files tend to become larger, the workflow abruptly ends.

Below is the ending portion of the output:

01:22:26.587 INFO PrintReads - ------------------------------------------------------------ 01:22:26.588 INFO PrintReads - The Genome Analysis Toolkit (GATK) v4.1.9.0 01:22:26.588 INFO PrintReads - For support and documentation go to https://software.broadinstitute.org/gatk/ 01:22:26.588 INFO PrintReads - Executing as isarfraz@scc-gn3 on Linux v4.18.0-477.15.1.el8_8.x86_64 amd64 01:22:26.588 INFO PrintReads - Java runtime: OpenJDK 64-Bit Server VM v11.0.20.1+1-post-Ubuntu-0ubuntu120.04 01:22:26.588 INFO PrintReads - Start Date/Time: November 16, 2023 at 1:22:26 AM GMT 01:22:26.588 INFO PrintReads - ------------------------------------------------------------ 01:22:26.588 INFO PrintReads - ------------------------------------------------------------ 01:22:26.588 INFO PrintReads - HTSJDK Version: 2.23.0 01:22:26.589 INFO PrintReads - Picard Version: 2.23.3 01:22:26.589 INFO PrintReads - HTSJDK Defaults.COMPRESSION_LEVEL : 2 01:22:26.589 INFO PrintReads - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 01:22:26.589 INFO PrintReads - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 01:22:26.589 INFO PrintReads - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 01:22:26.589 INFO PrintReads - Deflater: IntelDeflater 01:22:26.589 INFO PrintReads - Inflater: IntelInflater 01:22:26.589 INFO PrintReads - GCS max retries/reopens: 20 01:22:26.589 INFO PrintReads - Requester pays: disabled 01:22:26.589 INFO PrintReads - Initializing engine 01:22:26.671 INFO PrintReads - Done initializing engine 01:22:26.685 INFO ProgressMeter - Starting traversal 01:22:26.685 INFO ProgressMeter - Current Locu [2023-11-16 01:22:34,58] [info] WorkflowManagerActor WorkflowActor-1318276e-f6f8-460f-bd69-6fd7a1ef3d3b is in a terminal state: WorkflowFailedState [2023-11-16 01:23:01,22] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'. [2023-11-16 01:23:06,05] [info] SingleWorkflowRunnerActor writing metadata to /tmp/tmple2zpd52.json [2023-11-16 01:23:06,08] [info] Workflow polling stopped [2023-11-16 01:23:06,09] [info] 0 workflows released by cromid-0f089e9 [2023-11-16 01:23:06,09] [info] Shutting down WorkflowStoreActor - Timeout = 5 seconds [2023-11-16 01:23:06,10] [info] Shutting down WorkflowLogCopyRouter - Timeout = 5 seconds [2023-11-16 01:23:06,10] [info] Shutting down JobExecutionTokenDispenser - Timeout = 5 seconds [2023-11-16 01:23:06,10] [info] Aborting all running workflows. [2023-11-16 01:23:06,10] [info] JobExecutionTokenDispenser stopped [2023-11-16 01:23:06,10] [info] WorkflowStoreActor stopped [2023-11-16 01:23:06,11] [info] WorkflowLogCopyRouter stopped [2023-11-16 01:23:06,11] [info] Shutting down WorkflowManagerActor - Timeout = 3600 seconds [2023-11-16 01:23:06,11] [info] WorkflowManagerActor All workflows finished [2023-11-16 01:23:06,11] [info] WorkflowManagerActor stopped [2023-11-16 01:23:06,27] [info] Connection pools shut down [2023-11-16 01:23:06,28] [info] Shutting down SubWorkflowStoreActor - Timeout = 1800 seconds [2023-11-16 01:23:06,28] [info] Shutting down JobStoreActor - Timeout = 1800 seconds [2023-11-16 01:23:06,28] [info] Shutting down CallCacheWriteActor - Timeout = 1800 seconds [2023-11-16 01:23:06,28] [info] Shutting down ServiceRegistryActor - Timeout = 1800 seconds [2023-11-16 01:23:06,28] [info] Shutting down DockerHashActor - Timeout = 1800 seconds [2023-11-16 01:23:06,28] [info] Shutting down IoProxy - Timeout = 1800 seconds [2023-11-16 01:23:06,28] [info] SubWorkflowStoreActor stopped [2023-11-16 01:23:06,28] [info] CallCacheWriteActor Shutting down: 0 queued messages to process [2023-11-16 01:23:06,28] [info] JobStoreActor stopped [2023-11-16 01:23:06,28] [info] WriteMetadataActor Shutting down: 0 queued messages to process [2023-11-16 01:23:06,28] [info] KvWriteActor Shutting down: 0 queued messages to process [2023-11-16 01:23:06,28] [info] CallCacheWriteActor stopped [2023-11-16 01:23:06,28] [info] IoProxy stopped [2023-11-16 01:23:06,28] [info] ServiceRegistryActor stopped [2023-11-16 01:23:06,29] [info] DockerHashActor stopped [2023-11-16 01:23:06,30] [info] Database closed [2023-11-16 01:23:06,30] [info] Stream materializer shut down [2023-11-16 01:23:06,31] [info] WDL HTTP import resolver closed Workflow 1318276e-f6f8-460f-bd69-6fd7a1ef3d3b transitioned to state Failed

Investigating a little further, I found that it is the BQSR call that causes the workflow to fail. Looking at the stderr of this call, it seems that as the input data becomes larger, the default value of 20000 for the argument --maximum-cycle-value is not enough. Please see below, the output from the stderr:

01:22:29.268 INFO ApplyBQSR - ------------------------------------------------------------ 01:22:29.269 INFO ApplyBQSR - The Genome Analysis Toolkit (GATK) v4.1.9.0 01:22:29.269 INFO ApplyBQSR - For support and documentation go to https://software.broadinstitute.org/gatk/ 01:22:29.269 INFO ApplyBQSR - Executing as isarfraz@scc-gn3 on Linux v4.18.0-477.15.1.el8_8.x86_64 amd64 01:22:29.269 INFO ApplyBQSR - Java runtime: OpenJDK 64-Bit Server VM v11.0.20.1+1-post-Ubuntu-0ubuntu120.04 01:22:29.269 INFO ApplyBQSR - Start Date/Time: November 16, 2023 at 1:22:29 AM GMT 01:22:29.269 INFO ApplyBQSR - ------------------------------------------------------------ 01:22:29.269 INFO ApplyBQSR - ------------------------------------------------------------ 01:22:29.270 INFO ApplyBQSR - HTSJDK Version: 2.23.0 01:22:29.270 INFO ApplyBQSR - Picard Version: 2.23.3 01:22:29.270 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2 01:22:29.270 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 01:22:29.270 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 01:22:29.270 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 01:22:29.270 INFO ApplyBQSR - Deflater: IntelDeflater 01:22:29.270 INFO ApplyBQSR - Inflater: IntelInflater 01:22:29.270 INFO ApplyBQSR - GCS max retries/reopens: 20 01:22:29.270 INFO ApplyBQSR - Requester pays: disabled 01:22:29.270 INFO ApplyBQSR - Initializing engine 01:22:29.382 INFO ApplyBQSR - Done initializing engine 01:22:29.395 INFO ProgressMeter - Starting traversal 01:22:29.396 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute 01:22:31.784 INFO ApplyBQSR - Shutting down engine [November 16, 2023 at 1:22:31 AM GMT] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 0.04 minutes. Runtime.totalMemory()=2147483648


A USER ERROR has occurred: The maximum allowed value for the cycle is 20000, but a larger cycle (20001) was detected. Please use the --maximum-cycle-value argument (when creating the recalibration table in BaseRecalibrator) to increase this value (at the expense of requiring more memory to run)

However, I can't really find a way to use this argument to manually specify a larger value. Using the --no_bqsr argument with the singularity image execution does seem to eliminate this issue and produces the output. I am not fully sure if excluding BQSR would have significant effect on the output produced. Also, using smaller fastq files does not seem to produce this issue.

I am hoping if there is a potential hack/workaround to manually specify this param while using the singularity image.

I use the following code for execution:

singularity exec -e -B ${DATA_FOLDER}:${DATA_FOLDER} -B /tmp:/tmp \
          -B genome_path/ctat_genome_lib_build_dir:/ctat_genome_lib_dir:ro  \
            ctat_mutations_latest.sif \
           /usr/local/src/ctat-mutations/ctat_mutations \
           --left ${DATA_FOLDER}/some_reads.fastq.gz \
           --is_long_reads \
           --sample_id test \
           --output ${DATA_FOLDER}/example.HC.singularity \
           --cpu 10 \
           --genome_lib_dir /ctat_genome_lib_dir

Thanks for looking into this!

Best, Irzam

brianjohnhaas commented 8 months ago

Hi Izram,

I think we might just need to disable bqsr by default when the long reads option is used. That parameter of 20kb was meant to be the longest read to examine, but it looks like you've got reads longer than that. Leave this open, though, and I'll research it a bit more before the next software release.

best,

Brian

On Wed, Nov 15, 2023 at 8:45 PM Irzam Sarfraz @.***> wrote:

Hi,

Thanks for creating this useful pipeline!

I am running the pipeline using a singularity container on some long read sequencing data. It seems to work well with smaller input fastq files but as soon as the files tend to become larger, the workflow abruptly ends.

Below is the ending portion of the output:

01:22:26.587 INFO PrintReads -

01:22:26.588 INFO PrintReads - The Genome Analysis Toolkit (GATK) v4.1.9.0 01:22:26.588 INFO PrintReads - For support and documentation go to https://software.broadinstitute.org/gatk/ 01:22:26.588 INFO PrintReads - Executing as @.*** on Linux v4.18.0-477.15.1.el8_8.x86_64 amd64 01:22:26.588 INFO PrintReads - Java runtime: OpenJDK 64-Bit Server VM v11.0.20.1+1-post-Ubuntu-0ubuntu120.04 01:22:26.588 INFO PrintReads - Start Date/Time: November 16, 2023 at 1:22:26 AM GMT 01:22:26.588 INFO PrintReads -

01:22:26.588 INFO PrintReads -

01:22:26.588 INFO PrintReads - HTSJDK Version: 2.23.0 01:22:26.589 INFO PrintReads - Picard Version: 2.23.3 01:22:26.589 INFO PrintReads - HTSJDK Defaults.COMPRESSION_LEVEL : 2 01:22:26.589 INFO PrintReads - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 01:22:26.589 INFO PrintReads - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 01:22:26.589 INFO PrintReads - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 01:22:26.589 INFO PrintReads - Deflater: IntelDeflater 01:22:26.589 INFO PrintReads - Inflater: IntelInflater 01:22:26.589 INFO PrintReads - GCS max retries/reopens: 20 01:22:26.589 INFO PrintReads - Requester pays: disabled 01:22:26.589 INFO PrintReads - Initializing engine 01:22:26.671 INFO PrintReads - Done initializing engine 01:22:26.685 INFO ProgressMeter - Starting traversal 01:22:26.685 INFO ProgressMeter - Current Locu [2023-11-16 01:22:34,58] [info] WorkflowManagerActor WorkflowActor-1318276e-f6f8-460f-bd69-6fd7a1ef3d3b is in a terminal state: WorkflowFailedState [2023-11-16 01:23:01,22] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'. [2023-11-16 01:23:06,05] [info] SingleWorkflowRunnerActor writing metadata to /tmp/tmple2zpd52.json [2023-11-16 01:23:06,08] [info] Workflow polling stopped [2023-11-16 01:23:06,09] [info] 0 workflows released by cromid-0f089e9 [2023-11-16 01:23:06,09] [info] Shutting down WorkflowStoreActor - Timeout = 5 seconds [2023-11-16 01:23:06,10] [info] Shutting down WorkflowLogCopyRouter - Timeout = 5 seconds [2023-11-16 01:23:06,10] [info] Shutting down JobExecutionTokenDispenser - Timeout = 5 seconds [2023-11-16 01:23:06,10] [info] Aborting all running workflows. [2023-11-16 01:23:06,10] [info] JobExecutionTokenDispenser stopped [2023-11-16 01:23:06,10] [info] WorkflowStoreActor stopped [2023-11-16 01:23:06,11] [info] WorkflowLogCopyRouter stopped [2023-11-16 01:23:06,11] [info] Shutting down WorkflowManagerActor - Timeout = 3600 seconds [2023-11-16 01:23:06,11] [info] WorkflowManagerActor All workflows finished [2023-11-16 01:23:06,11] [info] WorkflowManagerActor stopped [2023-11-16 01:23:06,27] [info] Connection pools shut down [2023-11-16 01:23:06,28] [info] Shutting down SubWorkflowStoreActor - Timeout = 1800 seconds [2023-11-16 01:23:06,28] [info] Shutting down JobStoreActor - Timeout = 1800 seconds [2023-11-16 01:23:06,28] [info] Shutting down CallCacheWriteActor - Timeout = 1800 seconds [2023-11-16 01:23:06,28] [info] Shutting down ServiceRegistryActor - Timeout = 1800 seconds [2023-11-16 01:23:06,28] [info] Shutting down DockerHashActor - Timeout = 1800 seconds [2023-11-16 01:23:06,28] [info] Shutting down IoProxy - Timeout = 1800 seconds [2023-11-16 01:23:06,28] [info] SubWorkflowStoreActor stopped [2023-11-16 01:23:06,28] [info] CallCacheWriteActor Shutting down: 0 queued messages to process [2023-11-16 01:23:06,28] [info] JobStoreActor stopped [2023-11-16 01:23:06,28] [info] WriteMetadataActor Shutting down: 0 queued messages to process [2023-11-16 01:23:06,28] [info] KvWriteActor Shutting down: 0 queued messages to process [2023-11-16 01:23:06,28] [info] CallCacheWriteActor stopped [2023-11-16 01:23:06,28] [info] IoProxy stopped [2023-11-16 01:23:06,28] [info] ServiceRegistryActor stopped [2023-11-16 01:23:06,29] [info] DockerHashActor stopped [2023-11-16 01:23:06,30] [info] Database closed [2023-11-16 01:23:06,30] [info] Stream materializer shut down [2023-11-16 01:23:06,31] [info] WDL HTTP import resolver closed Workflow 1318276e-f6f8-460f-bd69-6fd7a1ef3d3b transitioned to state Failed

Investigating a little further, I found that it is the BQSR call that causes the workflow to fail. Looking at the stderr of this call, it seems that as the input data becomes larger, the default value of 20000 for the argument --maximum-cycle-value is not enough. Please see below, the output from the stderr:

01:22:29.268 INFO ApplyBQSR -

01:22:29.269 INFO ApplyBQSR - The Genome Analysis Toolkit (GATK) v4.1.9.0 01:22:29.269 INFO ApplyBQSR - For support and documentation go to https://software.broadinstitute.org/gatk/ 01:22:29.269 INFO ApplyBQSR - Executing as @.*** on Linux v4.18.0-477.15.1.el8_8.x86_64 amd64 01:22:29.269 INFO ApplyBQSR - Java runtime: OpenJDK 64-Bit Server VM v11.0.20.1+1-post-Ubuntu-0ubuntu120.04 01:22:29.269 INFO ApplyBQSR - Start Date/Time: November 16, 2023 at 1:22:29 AM GMT 01:22:29.269 INFO ApplyBQSR -

01:22:29.269 INFO ApplyBQSR -

01:22:29.270 INFO ApplyBQSR - HTSJDK Version: 2.23.0 01:22:29.270 INFO ApplyBQSR - Picard Version: 2.23.3 01:22:29.270 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2 01:22:29.270 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 01:22:29.270 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 01:22:29.270 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 01:22:29.270 INFO ApplyBQSR - Deflater: IntelDeflater 01:22:29.270 INFO ApplyBQSR - Inflater: IntelInflater 01:22:29.270 INFO ApplyBQSR - GCS max retries/reopens: 20 01:22:29.270 INFO ApplyBQSR - Requester pays: disabled 01:22:29.270 INFO ApplyBQSR - Initializing engine 01:22:29.382 INFO ApplyBQSR - Done initializing engine 01:22:29.395 INFO ProgressMeter - Starting traversal 01:22:29.396 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute 01:22:31.784 INFO ApplyBQSR - Shutting down engine [November 16, 2023 at 1:22:31 AM GMT] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 0.04 minutes. Runtime.totalMemory()=2147483648

A USER ERROR has occurred: The maximum allowed value for the cycle is 20000, but a larger cycle (20001) was detected. Please use the --maximum-cycle-value argument (when creating the recalibration table in BaseRecalibrator) to increase this value (at the expense of requiring more memory to run)

However, I can't really find a way to use this argument to manually specify a larger value. Using the --no_bqsr argument with the singularity image execution does seem to eliminate this issue and produces the output. I am not fully sure if excluding BQSR would have significant effect on the output produced. Also, using smaller fastq files does not seem to produce this issue.

I am hoping if there is a potential hack/workaround to manually specify this param while using the singularity image.

I use the following code for execution:

singularity exec -e -B ${DATA_FOLDER}:${DATA_FOLDER} -B /tmp:/tmp \ -B genome_path/ctat_genome_lib_build_dir:/ctat_genome_lib_dir:ro \ ctat_mutations_latest.sif \ /usr/local/src/ctat-mutations/ctat_mutations \ --left ${DATA_FOLDER}/some_reads.fastq.gz \ --is_long_reads \ --sample_id test \ --output ${DATA_FOLDER}/example.HC.singularity \ --cpu 10 \ --genome_lib_dir /ctat_genome_lib_dir

Thanks for looking into this!

Best, Irzam

— Reply to this email directly, view it on GitHub https://github.com/NCIP/ctat-mutations/issues/129, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX7CZ33I75PEHTQOP3TYEVV2BAVCNFSM6AAAAAA7NMNGRSVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE4TKOBXHE3TCOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas