aws-quickstart / quickstart-illumina-dragen

AWS Quick Start Team
Apache License 2.0
24 stars 26 forks source link

Default docker container entrypoint conflict with nextflow awsbatch executor #50

Open ericpanyc opened 2 years ago

ericpanyc commented 2 years ago

When submitting job with Nextflow to aws batch, the default entry point ["python3","/root/quickstart/dragen_qs.py"] will prevent the command defined in Nextflow script from being parsed properly by the Dragen instance due to the Bash launcher script added in front of the command by Nextflow. Is it possible to provide another version of docker image which use /usr/bin/bash as entrypoint? Thanks!

Example Nextflow process: process dragen {

    executor 'awsbatch'
    queue 'dragen-queue'
    container 'job-definition://dragen'

    """
    -f \
    -r s3:///Dragen_aws_test/dragen_ref/hs37d5/ \
    -1 s3://Dragen_aws_test/fastqs/CPM00004614-BM-D_20220105_S1_R1_001.fastq.gz \
    -2 s3://Dragen_aws_test/fastqs/CPM00004614-BM-D_20220105_S1_R2_001.fastq.gz \
    --RGID CPM00004614-BM-D_20220105 \
    --RGSM CPM00004614-BM-D_20220105 \
    --enable-bam-indexing true \
    --enable-map-align true \
    --enable-map-align-output true \
    --output-format BAM \
    --enable-sort true \
    --output-file-prefix CPM00004614-BM-D_20220105-dragen \
    --output-directory s3://Dragen_aws_test/nf/output/ \
    --enable-variant-caller true
    """

}

Example error log when Nextflow run the previous process: nextflow version 21.10.0.5640 Error executing process > 'dragen'

Caused by: Essential container in task exited

Command executed:

-f nd -r s3://cpmpublic/Dragen_aws_test/dragen_ref/hs37d5/ -1 s3://cpmpublic/Dragen_aws_test/fastqs/CPM00004614-BM-D_20220105_S1_R1_001.fastq.gz -2 s3://cpmpublic/Dragen_aws_test/fastqs/CPM00004614-BM-D_20220105_S1_R2_001.fastq.gz --RGID CPM00004614-BM-D_20220105 --RGSM CPM00004614-BM-D_20220105 --enable-bam-indexing true --enable-map-align true --enable-map-align-output true --output-format BAM --enable-sort true --output-file-prefix CPM00004614-BM-D_20220105-dragen --output-directory s3://cpmpublic/Dragen_aws_test/nf/output/ --enable-variant-caller true

Command exit status:

Command output: (empty)

Command error: (more omitted..) --qc-coverage-reports-1 arg Coverage reports requested for qc-coverage-region-1 --qc-coverage-reports-2 arg Coverage reports requested for qc-coverage-region-2 --qc-coverage-reports-3 arg Coverage reports requested for qc-coverage-region-3 --qc-coverage-filters-1 arg Filters requested for qc-coverage-region-1 --qc-coverage-filters-2 arg Filters requested for qc-coverage-region-2 --qc-coverage-filters-3 arg Filters requested for qc-coverage-region-3 --qc-coverage-ignore-overlaps arg Spend extra time to avoid double-counting overlapping mates --qc-coverage-count-soft-clipped-bases arg Consider soft clipped bases towards coverage --qc-cross-cont-vcf arg Variant file (.vcf/.vcf.gz) with population allele frequencies to estimate sample contamination --qc-somatic-contam-vcf arg Variant file (.vcf/.vcf.gz) with population allele frequencies to estimate somatic sample contamination --qc-somatic-contam-normal-pileup arg Pileup file (.tallies) with pre-calculated pileup summaries to estimate somatic sample contamination --qc-somatic-contam-tumor-pileup arg Pileup file (.tallies) with pre-calculated pileup summaries to estimate somatic sample contamination --gc-metrics-enable arg Enable calculation of GC bias metrics --gc-metrics-window-size arg Window size for GC bias calculation (Default=100) --gc-metrics-num-bins arg Number of (histogram) bins for GC bias summary metrics (Default=5, quintiles) --enable-metrics-compression arg Enable compression of large metric files (default=false) --pe-coverage-factors arg Chromosomes coverage factors for use by ploidy estimator --vc-systematic-noise-raw-input-list arg List of files to be processed, one file per line --vc-systematic-noise-germline-vaf-threshold arg Minimum variant allele frequency threshold to define germline variants; Only works with VCF format; Specify either '--vc-systematic-noise-germline-vaf-threshold' or '--vc-systematic-noise-use-germline-tag' to remove germlines. Default: none --vc-systematic-noise-use-germline-tag arg Whether to use DRAGEN germline tagging to remove germlines; Only works with DRAGEN VCF format; Specify either '--vc-systematic-noise-germline-vaf-threshold' or '--vc-systematic-noise-use-germline-tag' to remove germlines. Default: false --vc-systematic-noise-method arg Method to compute noise across samples, either 'mean' or 'max' or 'aggregate', default: mean -f [ --force ] Force overwrite of existing output file -l [ --force-load-reference ] Load the reference, even if it appears to already be loaded --skip-load-reference Skip loading the reference -i [ --interleaved ] Interleaved paired-end reads in single FASTQ -h [ --help ] Print this help message -v [ --verbose ] Be talkative -V [ --version ] Print the version and exit --lic-no-print Do not print license status at the end of a run -T [ --type ] Print the dragen type and exit ERROR: unrecognised option '-o' /bin/bash: /opt/conda/bin/aws: No such file or directory Error: Output S3 location not specified! Removing Output dir /ephemeral/80f1e77e-4d6e-42bf-8164-efba5e20dd6c Job is exiting with code 1 Caught SystemExit: Exiting with status 1

Work dir: s3://cpmpublic/oki_output/tmp/28/c3eed9ff6037ba92714b3c3822faa8

valeandri commented 2 years ago

I think that this error refers to the missing of awscli inside the dragen image. Nextflow requires an AMI with both docker and awscli installed.

Did you solve somehow the issue?