aws-samples / amazon-omics-tutorials

Apache License 2.0
56 stars 23 forks source link

Manifest containing multiple versions of same image fails #30

Closed Tmacphee13 closed 10 months ago

Tmacphee13 commented 10 months ago

When migrating images from nfcore/rnaseq, I was thrown a step functions exception saying "The repository with the name 'biocontainers/samtools' already exists in the registry". I believe that this may be caused by the multiple versions of samtools identified by the helper script. The manifest can be seen below:

{
    "manifest": [
        "biocontainers/bbmap:39.01--h5c4e2a8_0",
        "biocontainers/bedtools:2.30.0--hc088bd4_0",
        "biocontainers/bioconductor-dupradar:1.28.0--r42hdfd78af_0",
        "biocontainers/bioconductor-summarizedexperiment:1.24.0--r41hdfd78af_0",
        "biocontainers/bioconductor-tximeta:1.12.0--r41hdfd78af_0",
        "biocontainers/fastp:0.23.4--h5f740d0_0",
        "biocontainers/fastqc:0.11.9--0",
        "biocontainers/fq:0.9.1--h9ee0642_0",
        "biocontainers/gffread:0.12.1--h8b12597_0",
        "biocontainers/hisat2:2.2.1--h1b792b2_3",
        "biocontainers/mulled-v2-1fa26d1ce03c295fe2fdcf85831a92fbcbd7e8c2:1df389393721fc66f3fd8778ad938ac711951107-0",
        "biocontainers/mulled-v2-1fa26d1ce03c295fe2fdcf85831a92fbcbd7e8c2:59cdd445419f14abac76b31dd0d71217994cbcc9-0",
        "biocontainers/mulled-v2-8849acf39a43cdd6c839a369a74c0adc823e2f91:ab110436faf952a33575c64dd74615a84011450b-0",
        "biocontainers/mulled-v2-a97e90b3b802d1da3d6958e0867610c718cb5eb1:2cdf6bf1e92acbeb9b2834b1c58754167173a410-0",
        "biocontainers/mulled-v2-cf0123ef83b3c38c13e3b0696a3f285d3f20f15b:64aad4a4e144878400649e71f42105311be7ed87-0",
        "biocontainers/multiqc:1.14--pyhdfd78af_0",
        "biocontainers/perl:5.26.2",
        "biocontainers/picard:3.0.0--hdfd78af_1",
        "biocontainers/preseq:3.1.2--h445547b_2",
        "biocontainers/python:3.9--1",
        "biocontainers/qualimap:2.2.2d--1",
        "biocontainers/rseqc:3.0.1--py37h516909a_1",
        "biocontainers/salmon:1.10.1--h7e5ed60_0",
        "biocontainers/samtools:1.16.1--h6899075_1",
        "biocontainers/samtools:1.17--h00cdaf9_0",
        "biocontainers/sortmerna:4.3.4--h9ee0642_0",
        "biocontainers/stringtie:2.2.1--hecb563c_2",
        "biocontainers/subread:2.0.1--hed695b0_0",
        "biocontainers/trim-galore:0.6.7--hdfd78af_0",
        "biocontainers/ucsc-bedclip:377--h0b8a92a_2",
        "biocontainers/ucsc-bedgraphtobigwig:377--h446ed27_1",
        "biocontainers/umi_tools:1.1.4--py38hbff2b2d_1",
        "nf-core/ubuntu:20.04"
    ]
}
wleepang commented 10 months ago

Thanks for submitting this issue and providing an example case. I was able to replicate the problem. It looks like a race condition between parallel executing Map iterations. For example:

  1. Iterations 1 and 2 detect that repository foo does not exist and proceed to the ECR Create Repository state
  2. Iteration 1 creates the target repository before Iteration 2, causing Iteration 2 to fail with a "Repository already exists" error.

There are a couple possible ways to address this:

  1. explicitly catch the "Repository already exists" error from the Create Repository task
  2. add a retry with delay to the ECR Describe Repository task

The best option is probably the former. The latter may still yield a race condition if the state machine is operated at higher scale.