c-BIG / NPM-sample-qc

reference implementation of GA4GH WGS Quality Control Standards
https://c-big.github.io/NPM-sample-qc
MIT License
9 stars 2 forks source link

Error executing process > 'picard_collect_multiple_metrics_bam (NA12878-chr14-AKT1)' #67

Closed skanwal closed 1 year ago

skanwal commented 1 year ago

Hi,

I have cloned the repository (again) to pull in the updates from 3 days ago.

I am getting following error while trying to run the pipeline on test data and script under NPM-sample-qc/tests/NA12878-chr14-AKT1_1000genomes-dragen-3.7.6.

$ bash run.sh
N E X T F L O W  ~  version 22.10.0
Launching `../../main.nf` [ridiculous_dubinsky] DSL2 - revision: 1f2eb30b43
N E X T F L O W  ~  version 22.10.0 5826
NPM-sample-qc  ~  version 0.7
User name    : kanwals
Command Line : nextflow run ../../main.nf -config ../../nextflow.config -params-file params.yml -work-dir ./work --outdir ./output
Project Dir  : /Users/kanwals/UMCCR/git/NPM-sample-qc
Launch Dir   : /Users/kanwals/UMCCR/git/NPM-sample-qc/tests/NA12878-chr14-AKT1_1000genomes-dragen-3.7.6
Work Dir     : /Users/kanwals/UMCCR/git/NPM-sample-qc/tests/NA12878-chr14-AKT1_1000genomes-dragen-3.7.6/work
Results Dir  : ./output/results
Info Dir     : ./output/pipeline_info
Profile      : standard
executor >  local (3)
[47/31aef0] process > samtools_stats (NA12878-chr14-AKT1)                      [100%] 1 of 1 ✔
[fb/3ceb6f] process > picard_collect_multiple_metrics_bam (NA12878-chr14-AKT1) [  0%] 0 of 1
[76/fb3be2] process > mosdepth_bam (NA12878-chr14-AKT1)                        [  0%] 0 of 1
[-        ] process > mosdepth_datamash                                        -
[-        ] process > multiqc                                                  -
[-        ] process > compile_metrics                                          -
Error executing process > 'picard_collect_multiple_metrics_bam (NA12878-chr14-AKT1)'

Caused by:
  Process `picard_collect_multiple_metrics_bam (NA12878-chr14-AKT1)` terminated with an error exit status (1)

Command executed:

  # program CollectQualityYieldMetrics to get numbers of bases that pass a base quality score 30 threshold
  # program CollectInsertSizeMetrics to get mean insert size

  picard CollectMultipleMetrics          I=NA12878-chr14-AKT1.bam         O=NA12878-chr14-AKT1         ASSUME_SORTED=true         FILE_EXTENSION=".txt"         PROGRAM=null         PROGRAM=CollectQualityYieldMetrics         PROGRAM=CollectInsertSizeMetrics         METRIC_ACCUMULATION_LEVEL=null         METRIC_ACCUMULATION_LEVEL=ALL_READS

Command exit status:
  1

Command output:
  (empty)

Command error:
                                clear the default value. Possible values: {ERROR, WARNING, INFO, DEBUG}

  QUIET=Boolean                 Whether to suppress job-summary info on System.err.  Default value: false. This option can
                                be set to 'null' to clear the default value. Possible values: {true, false}

  VALIDATION_STRINGENCY=ValidationStringency
                                Validation stringency for all SAM files read by this program.  Setting stringency to
                                SILENT can improve performance when processing a BAM file in which variable-length data
                                (read, qualities, tags) do not otherwise need to be decoded.  Default value: STRICT. This
                                option can be set to 'null' to clear the default value. Possible values: {STRICT, LENIENT,
                                SILENT}

  COMPRESSION_LEVEL=Integer     Compression level for all compressed files created (e.g. BAM and VCF).  Default value: 5.
                                This option can be set to 'null' to clear the default value.

  MAX_RECORDS_IN_RAM=Integer    When writing files that need to be sorted, this will specify the number of records stored
                                in RAM before spilling to disk. Increasing this number reduces the number of file handles
                                needed to sort the file, and increases the amount of RAM needed.  Default value: 500000.
                                This option can be set to 'null' to clear the default value.

  CREATE_INDEX=Boolean          Whether to create an index when writing VCF or coordinate sorted BAM output.  Default
                                value: false. This option can be set to 'null' to clear the default value. Possible
                                values: {true, false}

  CREATE_MD5_FILE=Boolean       Whether to create an MD5 digest for any BAM or FASTQ files created.    Default value:
                                false. This option can be set to 'null' to clear the default value. Possible values:
                                {true, false}

  REFERENCE_SEQUENCE=File
  R=File                        Reference sequence file.  Default value: null.

  GA4GH_CLIENT_SECRETS=String   Google Genomics API client_secrets.json file path.  Default value: client_secrets.json.
                                This option can be set to 'null' to clear the default value.

  USE_JDK_DEFLATER=Boolean
  USE_JDK_DEFLATER=Boolean      Use the JDK Deflater instead of the Intel Deflater for writing compressed output  Default
                                value: false. This option can be set to 'null' to clear the default value. Possible
                                values: {true, false}

  USE_JDK_INFLATER=Boolean
  USE_JDK_INFLATER=Boolean      Use the JDK Inflater instead of the Intel Inflater for reading compressed input  Default
                                value: false. This option can be set to 'null' to clear the default value. Possible
                                values: {true, false}

  OPTIONS_FILE=File             File of OPTION_NAME=value pairs.  No positional parameters allowed.  Unlike command-line
                                options, unrecognized options are ignored.  A single-valued option set in an options file
                                may be overridden by a subsequent command-line option.  A line starting with '#' is
                                considered a comment.  Required.

executor >  local (3)
[47/31aef0] process > samtools_stats (NA12878-chr14-AKT1)                      [100%] 1 of 1 ✔
[fb/3ceb6f] process > picard_collect_multiple_metrics_bam (NA12878-chr14-AKT1) [100%] 1 of 1, failed: 1 ✘
[-        ] process > mosdepth_bam (NA12878-chr14-AKT1)                        -
[-        ] process > mosdepth_datamash                                        -
[-        ] process > multiqc                                                  -
[-        ] process > compile_metrics                                          -
Workflow execution stopped with the following message:
Exit status   : 1
Error message :                               clear the default value. Possible values: {ERROR, WARNING, INFO, DEBUG}

QUIET=Boolean                 Whether to suppress job-summary info on System.err.  Default value: false. This option can
                              be set to 'null' to clear the default value. Possible values: {true, false}

VALIDATION_STRINGENCY=ValidationStringency
                              Validation stringency for all SAM files read by this program.  Setting stringency to
                              SILENT can improve performance when processing a BAM file in which variable-length data
                              (read, qualities, tags) do not otherwise need to be decoded.  Default value: STRICT. This
                              option can be set to 'null' to clear the default value. Possible values: {STRICT, LENIENT,
                              SILENT}

COMPRESSION_LEVEL=Integer     Compression level for all compressed files created (e.g. BAM and VCF).  Default value: 5.
                              This option can be set to 'null' to clear the default value.

MAX_RECORDS_IN_RAM=Integer    When writing files that need to be sorted, this will specify the number of records stored
                              in RAM before spilling to disk. Increasing this number reduces the number of file handles
                              needed to sort the file, and increases the amount of RAM needed.  Default value: 500000.
                              This option can be set to 'null' to clear the default value.

CREATE_INDEX=Boolean          Whether to create an index when writing VCF or coordinate sorted BAM output.  Default
                              value: false. This option can be set to 'null' to clear the default value. Possible
                              values: {true, false}

CREATE_MD5_FILE=Boolean       Whether to create an MD5 digest for any BAM or FASTQ files created.    Default value:
                              false. This option can be set to 'null' to clear the default value. Possible values:
                              {true, false}

REFERENCE_SEQUENCE=File
R=File                        Reference sequence file.  Default value: null.

GA4GH_CLIENT_SECRETS=String   Google Genomics API client_secrets.json file path.  Default value: client_secrets.json.
                              This option can be set to 'null' to clear the default value.

USE_JDK_DEFLATER=Boolean
USE_JDK_DEFLATER=Boolean      Use the JDK Deflater instead of the Intel Deflater for writing compressed output  Default
                              value: false. This option can be set to 'null' to clear the default value. Possible
                              values: {true, false}

USE_JDK_INFLATER=Boolean
USE_JDK_INFLATER=Boolean      Use the JDK Inflater instead of the Intel Inflater for reading compressed input  Default
                              value: false. This option can be set to 'null' to clear the default value. Possible
                              values: {true, false}

OPTIONS_FILE=File             File of OPTION_NAME=value pairs.  No positional parameters allowed.  Unlike command-line
                              options, unrecognized options are ignored.  A single-valued option set in an options file
                              may be overridden by a subsequent command-line option.  A line starting with '#' is
                              considered a comment.  Required.

R is not installed on this machine. It is required for creating the chart.
Error report  : Error executing process > 'picard_collect_multiple_metrics_bam (NA12878-chr14-AKT1)'

Caused by:
  Process `picard_collect_multiple_metrics_bam (NA12878-chr14-AKT1)` terminated with an error exit status (1)

Command executed:

  # program CollectQualityYieldMetrics to get numbers of bases that pass a base quality score 30 threshold
  # program CollectInsertSizeMetrics to get mean insert size

  picard CollectMultipleMetrics          I=NA12878-chr14-AKT1.bam         O=NA12878-chr14-AKT1         ASSUME_SORTED=true         FILE_EXTENSION=".txt"         PROGRAM=null         PROGRAM=CollectQualityYieldMetrics         PROGRAM=CollectInsertSizeMetrics         METRIC_ACCUMULATION_LEVEL=null         METRIC_ACCUMULATION_LEVEL=ALL_READS

Command exit status:
  1

Command output:
  (empty)

Command error:
                                clear the default value. Possible values: {ERROR, WARNING, INFO, DEBUG}

  QUIET=Boolean                 Whether to suppress job-summary info on System.err.  Default value: false. This option can
                                be set to 'null' to clear the default value. Possible values: {true, false}

  VALIDATION_STRINGENCY=ValidationStringency
                                Validation stringency for all SAM files read by this program.  Setting stringency to
                                SILENT can improve performance when processing a BAM file in which variable-length data
                                (read, qualities, tags) do not otherwise need to be decoded.  Default value: STRICT. This
                                option can be set to 'null' to clear the default value. Possible values: {STRICT, LENIENT,
                                SILENT}

  COMPRESSION_LEVEL=Integer     Compression level for all compressed files created (e.g. BAM and VCF).  Default value: 5.
                                This option can be set to 'null' to clear the default value.

  MAX_RECORDS_IN_RAM=Integer    When writing files that need to be sorted, this will specify the number of records stored
                                in RAM before spilling to disk. Increasing this number reduces the number of file handles
                                needed to sort the file, and increases the amount of RAM needed.  Default value: 500000.
                                This option can be set to 'null' to clear the default value.

  CREATE_INDEX=Boolean          Whether to create an index when writing VCF or coordinate sorted BAM output.  Default
                                value: false. This option can be set to 'null' to clear the default value. Possible
                                values: {true, false}

  CREATE_MD5_FILE=Boolean       Whether to create an MD5 digest for any BAM or FASTQ files created.    Default value:
                                false. This option can be set to 'null' to clear the default value. Possible values:
                                {true, false}

  REFERENCE_SEQUENCE=File
  R=File                        Reference sequence file.  Default value: null.

  GA4GH_CLIENT_SECRETS=String   Google Genomics API client_secrets.json file path.  Default value: client_secrets.json.
                                This option can be set to 'null' to clear the default value.

  USE_JDK_DEFLATER=Boolean
  USE_JDK_DEFLATER=Boolean      Use the JDK Deflater instead of the Intel Deflater for writing compressed output  Default
                                value: false. This option can be set to 'null' to clear the default value. Possible
                                values: {true, false}

  USE_JDK_INFLATER=Boolean
  USE_JDK_INFLATER=Boolean      Use the JDK Inflater instead of the Intel Inflater for reading compressed input  Default
                                value: false. This option can be set to 'null' to clear the default value. Possible
                                values: {true, false}

  OPTIONS_FILE=File             File of OPTION_NAME=value pairs.  No positional parameters allowed.  Unlike command-line
                                options, unrecognized options are ignored.  A single-valued option set in an options file
                                may be overridden by a subsequent command-line option.  A line starting with '#' is
                                considered a comment.  Required.

  R is not installed on this machine. It is required for creating the chart.

Work dir:
  /Users/kanwals/UMCCR/git/NPM-sample-qc/tests/NA12878-chr14-AKT1_1000genomes-dragen-3.7.6/work/fb/3ceb6f55e31066c7e7c14eea189b43

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
Error executing process > 'picard_collect_multiple_metrics_bam (NA12878-chr14-AKT1)'

Caused by:
  Process `picard_collect_multiple_metrics_bam (NA12878-chr14-AKT1)` terminated with an error exit status (1)

Command executed:

  # program CollectQualityYieldMetrics to get numbers of bases that pass a base quality score 30 threshold
  # program CollectInsertSizeMetrics to get mean insert size

  picard CollectMultipleMetrics          I=NA12878-chr14-AKT1.bam         O=NA12878-chr14-AKT1         ASSUME_SORTED=true         FILE_EXTENSION=".txt"         PROGRAM=null         PROGRAM=CollectQualityYieldMetrics         PROGRAM=CollectInsertSizeMetrics         METRIC_ACCUMULATION_LEVEL=null         METRIC_ACCUMULATION_LEVEL=ALL_READS

Command exit status:
  1

Command output:
  (empty)

Command error:
                                clear the default value. Possible values: {ERROR, WARNING, INFO, DEBUG}

  QUIET=Boolean                 Whether to suppress job-summary info on System.err.  Default value: false. This option can
                                be set to 'null' to clear the default value. Possible values: {true, false}

  VALIDATION_STRINGENCY=ValidationStringency
                                Validation stringency for all SAM files read by this program.  Setting stringency to
                                SILENT can improve performance when processing a BAM file in which variable-length data
                                (read, qualities, tags) do not otherwise need to be decoded.  Default value: STRICT. This
                                option can be set to 'null' to clear the default value. Possible values: {STRICT, LENIENT,
                                SILENT}

  COMPRESSION_LEVEL=Integer     Compression level for all compressed files created (e.g. BAM and VCF).  Default value: 5.
                                This option can be set to 'null' to clear the default value.

  MAX_RECORDS_IN_RAM=Integer    When writing files that need to be sorted, this will specify the number of records stored
                                in RAM before spilling to disk. Increasing this number reduces the number of file handles
                                needed to sort the file, and increases the amount of RAM needed.  Default value: 500000.
                                This option can be set to 'null' to clear the default value.

  CREATE_INDEX=Boolean          Whether to create an index when writing VCF or coordinate sorted BAM output.  Default
                                value: false. This option can be set to 'null' to clear the default value. Possible
                                values: {true, false}

  CREATE_MD5_FILE=Boolean       Whether to create an MD5 digest for any BAM or FASTQ files created.    Default value:
                                false. This option can be set to 'null' to clear the default value. Possible values:
                                {true, false}

  REFERENCE_SEQUENCE=File
  R=File                        Reference sequence file.  Default value: null.

  GA4GH_CLIENT_SECRETS=String   Google Genomics API client_secrets.json file path.  Default value: client_secrets.json.
                                This option can be set to 'null' to clear the default value.

  USE_JDK_DEFLATER=Boolean
  USE_JDK_DEFLATER=Boolean      Use the JDK Deflater instead of the Intel Deflater for writing compressed output  Default
                                value: false. This option can be set to 'null' to clear the default value. Possible
                                values: {true, false}

  USE_JDK_INFLATER=Boolean
  USE_JDK_INFLATER=Boolean      Use the JDK Inflater instead of the Intel Inflater for reading compressed input  Default
                                value: false. This option can be set to 'null' to clear the default value. Possible
                                values: {true, false}

  OPTIONS_FILE=File             File of OPTION_NAME=value pairs.  No positional parameters allowed.  Unlike command-line
                                options, unrecognized options are ignored.  A single-valued option set in an options file
                                may be overridden by a subsequent command-line option.  A line starting with '#' is
                                considered a comment.  Required.

  R is not installed on this machine. It is required for creating the chart.

Work dir:
  /Users/kanwals/UMCCR/git/NPM-sample-qc/tests/NA12878-chr14-AKT1_1000genomes-dragen-3.7.6/work/fb/3ceb6f55e31066c7e7c14eea189b43

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

WARN: Graphviz is required to render the execution DAG in the given format -- See http://www.graphviz.org for more info.

I believe the error in this traceline is here:

 R is not installed on this machine. It is required for creating the chart.

The pipeline internally uses docker container npm-sample-qc:latest (ID:810f383096bb) that is expected to contain all dependencies (including R) for the pipeline. Is this understanding correct? or this is expected to be tackled separately?

mhebrard commented 1 year ago

Could you show the .command.log from the picard step ( in /work/fb/) directory

skanwal commented 1 year ago

There is no log file in work/fb/ directory. There only exists a soft link for input bam file.

~/NPM-sample-qc/tests/NA12878-chr14-AKT1_1000genomes-dragen-3.7.6/work/fb/3ceb6f55e31066c7e7c14eea189b43$ l
total 0
lrwxr-xr-x 1 kanwals 111 Nov 14 11:18 NA12878-chr14-AKT1.bam -> /Users/kanwals/UMCCR/git/NPM-sample-qc/tests/NA12878-chr14-AKT1_1000genomes-dragen-3.7.6/NA12878-chr14-AKT1.bam
mhebrard commented 1 year ago

Sorry that i didn't pick the right subfolder. Could you look into your work directory and find the log related to Picard step. That would help to troubleshoot.

skanwal commented 1 year ago

There are errors in three command log files inside work:

  1. Platform issue
Click to expand! ``` $ cat ./work/47/31aef07a0c47825d2ecb4edfa7cc5e/.command.log Unable to find image 'cbig/npm-sample-qc:v0.7' locally v0.7: Pulling from cbig/npm-sample-qc 2b55860d4c66: Pulling fs layer 82152bc7efd5: Pulling fs layer 26b628520914: Pulling fs layer e3c13becd0f9: Pulling fs layer bc54b0d39099: Pulling fs layer 4f4fb700ef54: Pulling fs layer 98cab1afeca2: Pulling fs layer b46e8c05b3d9: Pulling fs layer 7bb358eb3113: Pulling fs layer bc54b0d39099: Waiting e3c13becd0f9: Waiting 4f4fb700ef54: Waiting 98cab1afeca2: Waiting b46e8c05b3d9: Waiting 98772fe4dfd9: Pulling fs layer 44c9d55adc1d: Pulling fs layer 823088ebdecc: Pulling fs layer 7bb358eb3113: Waiting b82b35a573be: Pulling fs layer b82b35a573be: Waiting 98772fe4dfd9: Waiting 44c9d55adc1d: Waiting 823088ebdecc: Waiting 82152bc7efd5: Download complete 2b55860d4c66: Verifying Checksum 2b55860d4c66: Download complete 2b55860d4c66: Pull complete 82152bc7efd5: Pull complete e3c13becd0f9: Verifying Checksum e3c13becd0f9: Download complete 4f4fb700ef54: Download complete 98cab1afeca2: Verifying Checksum 98cab1afeca2: Download complete b46e8c05b3d9: Verifying Checksum b46e8c05b3d9: Download complete bc54b0d39099: Download complete 7bb358eb3113: Verifying Checksum 7bb358eb3113: Download complete 44c9d55adc1d: Verifying Checksum 44c9d55adc1d: Download complete 26b628520914: Download complete 26b628520914: Pull complete e3c13becd0f9: Pull complete bc54b0d39099: Pull complete 4f4fb700ef54: Pull complete 98cab1afeca2: Pull complete b46e8c05b3d9: Pull complete 7bb358eb3113: Pull complete b82b35a573be: Verifying Checksum b82b35a573be: Download complete 823088ebdecc: Verifying Checksum 823088ebdecc: Download complete 98772fe4dfd9: Verifying Checksum 98772fe4dfd9: Download complete 98772fe4dfd9: Pull complete 44c9d55adc1d: Pull complete 823088ebdecc: Pull complete b82b35a573be: Pull complete Digest: sha256:4ba6df5e5128e025985717c0abd5ecb77e2155d90fddc802c922315b455c1491 Status: Downloaded newer image for cbig/npm-sample-qc:v0.7 WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested ```
  1. Also platform issue
Click to expand! ``` $ cat ./work/76/fb3be293fc07c9b6134f17df26f8de/.command.log WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested nxf-w58BETnyaA2QUD8yVacrcM7Z ```
  1. Picard issue (related to above issues, I believe).
Click to expand! ``` $ cat ./work/fb/3ceb6f55e31066c7e7c14eea189b43/.command.log Unable to find image 'cbig/npm-sample-qc:v0.7' locally v0.7: Pulling from cbig/npm-sample-qc 2b55860d4c66: Pulling fs layer 82152bc7efd5: Pulling fs layer 26b628520914: Pulling fs layer e3c13becd0f9: Pulling fs layer bc54b0d39099: Pulling fs layer 4f4fb700ef54: Pulling fs layer 98cab1afeca2: Pulling fs layer b46e8c05b3d9: Pulling fs layer 7bb358eb3113: Pulling fs layer 98772fe4dfd9: Pulling fs layer 44c9d55adc1d: Pulling fs layer 823088ebdecc: Pulling fs layer b82b35a573be: Pulling fs layer e3c13becd0f9: Waiting bc54b0d39099: Waiting 4f4fb700ef54: Waiting 98cab1afeca2: Waiting 823088ebdecc: Waiting b82b35a573be: Waiting 44c9d55adc1d: Waiting 98772fe4dfd9: Waiting 7bb358eb3113: Waiting 82152bc7efd5: Download complete 2b55860d4c66: Verifying Checksum 2b55860d4c66: Download complete 2b55860d4c66: Pull complete 82152bc7efd5: Pull complete e3c13becd0f9: Verifying Checksum e3c13becd0f9: Download complete 4f4fb700ef54: Download complete 98cab1afeca2: Verifying Checksum 98cab1afeca2: Download complete b46e8c05b3d9: Download complete bc54b0d39099: Verifying Checksum bc54b0d39099: Download complete 7bb358eb3113: Verifying Checksum 7bb358eb3113: Download complete 44c9d55adc1d: Verifying Checksum 44c9d55adc1d: Download complete 26b628520914: Download complete 26b628520914: Pull complete e3c13becd0f9: Pull complete bc54b0d39099: Pull complete 4f4fb700ef54: Pull complete 98cab1afeca2: Pull complete b46e8c05b3d9: Pull complete 7bb358eb3113: Pull complete b82b35a573be: Verifying Checksum b82b35a573be: Download complete 823088ebdecc: Verifying Checksum 823088ebdecc: Download complete 98772fe4dfd9: Verifying Checksum 98772fe4dfd9: Download complete 98772fe4dfd9: Pull complete 44c9d55adc1d: Pull complete 823088ebdecc: Pull complete b82b35a573be: Pull complete Digest: sha256:4ba6df5e5128e025985717c0abd5ecb77e2155d90fddc802c922315b455c1491 Status: Downloaded newer image for cbig/npm-sample-qc:v0.7 WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested /usr/local/conda/envs/npm-sample-qc/share/picard-2.27.0-0/picard: line 5: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8): No such file or directory INFO 2022-11-14 00:25:05 CollectMultipleMetrics ********** NOTE: Picard's command line syntax is changing. ********** ********** For more information, please see: ********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition) ********** ********** The command line looks like this in the new syntax: ********** ********** CollectMultipleMetrics -I NA12878-chr14-AKT1.bam -O NA12878-chr14-AKT1 -ASSUME_SORTED true -FILE_EXTENSION .txt -PROGRAM null -PROGRAM CollectQualityYieldMetrics -PROGRAM CollectInsertSizeMetrics -METRIC_ACCUMULATION_LEVEL null -METRIC_ACCUMULATION_LEVEL ALL_READS ********** INFO 2022-11-14 00:25:09 RExecutor Executing R script via command: Rscript /tmp/script5008846063550469075.R 00:25:09.371 WARN LegacyCommandLineArgumentParser - Hidden arguments are always printed in LegacyCommandLineArgumentParser USAGE: CollectMultipleMetrics [options] Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#CollectMultipleMetrics Collect multiple classes of metrics. This 'meta-metrics' tool runs one or more of the metrics collection modules at the same time to cut down on the time spent reading in data from input files. For each PROGRAM, the tool produces outputs. The valid values for PROGRAM and the output that would be generated by it are listed in the documentation of the PROGRAM argument. Currently all programs are run with default options and fixed output extensions, but this may become more flexible in future. Specifying a reference sequence file is required. Note: Metrics labeled as percentages (PCT_*) are actually expressed as fractions! Usage example (all modules on by default): java -jar picard.jar CollectMultipleMetrics \ I=input.bam \ O=multiple_metrics \ R=reference_sequence.fasta Usage example (two modules only): java -jar picard.jar CollectMultipleMetrics \ I=input.bam \ O=multiple_metrics \ R=reference_sequence.fasta \ PROGRAM=null \ PROGRAM=QualityScoreDistribution \ PROGRAM=MeanQualityByCycle Version: 2.27.0 Options: --help -h Displays options specific to this tool. --stdhelp -H Displays options specific to this tool AND options common to all Picard command line tools. --version Displays program version. INPUT=File I=File Input SAM or BAM file. Required. ASSUME_SORTED=Boolean AS=Boolean If true (default), then the sort order in the header file will be ignored. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} STOP_AFTER=Integer Stop after processing N reads, mainly for debugging. Default value: 0. This option can be set to 'null' to clear the default value. OUTPUT=String O=String Base name of output files. Required. METRIC_ACCUMULATION_LEVEL=MetricAccumulationLevel LEVEL=MetricAccumulationLevel The level(s) at which to accumulate metrics. Default value: [ALL_READS]. This option can be set to 'null' to clear the default value. Possible values: {ALL_READS, SAMPLE, LIBRARY, READ_GROUP} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list. FILE_EXTENSION=String EXT=String Append the given file extension to all metric file names (ex. OUTPUT.insert_size_metrics.EXT). None if null Default value: null. PROGRAM=Program Set of metrics programs to apply during the pass through the SAM file. Default value: [CollectAlignmentSummaryMetrics, CollectBaseDistributionByCycle, CollectInsertSizeMetrics, MeanQualityByCycle, QualityScoreDistribution]. This option can be set to 'null' to clear the default value. Possible values: { CollectAlignmentSummaryMetrics (Produces a summary of alignment metrics from a SAM or BAM file. Creates output with ".alignment_summary_metrics, .read_length_histogram.pdf" appended to OUTPUT.) CollectInsertSizeMetrics (Collect metrics about the insert size distribution of a paired-end library. Creates output with ".insert_size_metrics, .insert_size_histogram.pdf" appended to OUTPUT.) QualityScoreDistribution (Chart the distribution of quality scores. Creates output with ".quality_distribution_metrics, .quality_distribution.pdf" appended to OUTPUT.) MeanQualityByCycle (Collect mean quality by cycle.Creates output with ".quality_by_cycle_metrics, .quality_by_cycle.pdf" appended to OUTPUT.) CollectBaseDistributionByCycle (Chart the nucleotide distribution per cycle in a SAM or BAM fileCreates output with ".base_distribution_by_cycle_metrics, .base_distribution_by_cycle.pdf" appended to OUTPUT.) CollectGcBiasMetrics (Collect metrics regarding GC bias. Creates output with ".gc_bias.detail_metrics, .gc_bias.summary_metrics, .gc_bias.pdf" appended to OUTPUT.) RnaSeqMetrics (Produces RNA alignment metrics for a SAM or BAM file. Creates output with ".rna_metrics, .rna_coverage.pdf" appended to OUTPUT.) CollectSequencingArtifactMetrics (Collect metrics to quantify single-base sequencing artifacts. Creates output with ".bait_bias_detail_metrics, .bait_bias_summary_metrics, .pre_adapter_detail_metrics, .pre_adapter_summary_metrics, .error_summary_metrics" appended to OUTPUT.) CollectQualityYieldMetrics (Collect metrics about reads that pass quality thresholds and Illumina-specific filters. Creates output with ".quality_yield_metrics" appended to OUTPUT.) } This option may be specified 0 or more times. This option can be set to 'null' to clear the default list. INTERVALS=File An optional list of intervals to restrict analysis to. Only pertains to some of the PROGRAMs. Programs whose stand-alone CLP does not have an INTERVALS argument will silently ignore this argument. Default value: null. DB_SNP=File VCF format dbSNP file, used to exclude regions around known polymorphisms from analysis by some PROGRAMs; PROGRAMs whose CLP doesn't allow for this argument will quietly ignore it. Default value: null. REF_FLAT=File Gene annotations in refFlat form. Format described here: http://genome.ucsc.edu/goldenPath/gbdDescriptionsOld.html#RefFlat Default value: null. IGNORE_SEQUENCE=String If a read maps to a sequence specified with this option, all the bases in the read are counted as ignored bases. Default value: null. This option may be specified 0 or more times. INCLUDE_UNPAIRED=Boolean UNPAIRED=Boolean Include unpaired reads in CollectSequencingArtifactMetrics. If set to true then all paired reads will be included as well - MINIMUM_INSERT_SIZE and MAXIMUM_INSERT_SIZE will be ignored in CollectSequencingArtifactMetrics. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} EXTRA_ARGUMENT=String extra arguments to the various tools can be specified using the following format::: where is one of the programs specified in PROGRAM, and are the argument and value that you'd like to specify as you would on the command line. For example, to change the HISTOGRAM_WIDTH in CollectInsertSizeMetrics to 200, use: "EXTRA_ARGUMENT=CollectInsertSizeMetrics::HISTOGRAM_WIDTH=200" or, in the new parser:--EXTRA_ARGUMENT "CollectInsertSizeMetrics::--HISTOGRAM_WIDTH 200" (Quotes are required to avoid the shell from separating this into two arguments.) Note that the following arguments cannot be modified on a per-program level: INPUT, REFERENCE_SEQUENCE, ASSUME_SORTED, and STOP_AFTER. Providing them in an EXTRA_ARGUMENT will _not_ result in an error, but they will be silently ignored. Default value: null. This option may be specified 0 or more times. TMP_DIR=File One or more directories with space available to be used by this program for temporary storage of working files Default value: null. This option may be specified 0 or more times. VERBOSITY=LogLevel Control verbosity of logging. Default value: INFO. This option can be set to 'null' to clear the default value. Possible values: {ERROR, WARNING, INFO, DEBUG} QUIET=Boolean Whether to suppress job-summary info on System.err. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} VALIDATION_STRINGENCY=ValidationStringency Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default value: STRICT. This option can be set to 'null' to clear the default value. Possible values: {STRICT, LENIENT, SILENT} COMPRESSION_LEVEL=Integer Compression level for all compressed files created (e.g. BAM and VCF). Default value: 5. This option can be set to 'null' to clear the default value. MAX_RECORDS_IN_RAM=Integer When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed. Default value: 500000. This option can be set to 'null' to clear the default value. CREATE_INDEX=Boolean Whether to create an index when writing VCF or coordinate sorted BAM output. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} CREATE_MD5_FILE=Boolean Whether to create an MD5 digest for any BAM or FASTQ files created. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} REFERENCE_SEQUENCE=File R=File Reference sequence file. Default value: null. GA4GH_CLIENT_SECRETS=String Google Genomics API client_secrets.json file path. Default value: client_secrets.json. This option can be set to 'null' to clear the default value. USE_JDK_DEFLATER=Boolean USE_JDK_DEFLATER=Boolean Use the JDK Deflater instead of the Intel Deflater for writing compressed output Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} USE_JDK_INFLATER=Boolean USE_JDK_INFLATER=Boolean Use the JDK Inflater instead of the Intel Inflater for reading compressed input Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} OPTIONS_FILE=File File of OPTION_NAME=value pairs. No positional parameters allowed. Unlike command-line options, unrecognized options are ignored. A single-valued option set in an options file may be overridden by a subsequent command-line option. A line starting with '#' is considered a comment. Required. R is not installed on this machine. It is required for creating the chart. ```
skanwal commented 1 year ago

@mhebrard - did you have a chance to look into this?

mhebrard commented 1 year ago

@skanwal - a few notes

It seems to me an issue with your docker installation or configuration. May I know why you had RAM limitation in a previous issue and do you have any disk limitation for docker on your machine ?

skanwal commented 1 year ago

May I know why you had RAM limitation in a previous issue

I won't say it's a limitation. Docker, by default assigns limited resources (cpu, mem) to the processes. It seemed 2GB (the default docker memory) wasn't enough for the pipeline so I bumped it up a bit.

do you have any disk limitation for docker on your machine

No. Also, I use docker regularly on this machine for other work projects and haven't experienced any issues.

skanwal commented 1 year ago

Hi @mhebrard

Wanted to let you know that I tried testing the latest release code and workflow completes successfully with test data. Not sure what was causing the issue before but closing this as we don't get the same error with updated pipeline.