broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
997 stars 361 forks source link

Cromwell 75 complains about GCS output file not found when delocalizing directories #6677

Open freeseek opened 2 years ago

freeseek commented 2 years ago
$ echo 'version development

workflow main {
  call main { input: s1 = "x", s2 = "y" }
  output { Array[File] f = main.f }
}

task main {
  input {
    String s1
    String s2
  }

  command <<<
    set -euo pipefail
    mkdir d
    touch "d/~{s1}"
    touch "d/~{s2}"
    echo -e "d/~{s1}\nd/~{s2}"
  >>>

  output {
    Directory d = "d"
    Array[File] f = read_lines(stdout())
  }

  runtime {
    docker: "debian:stable-slim"
  }
}' > main.wdl

This workflow when run on Google Cloud using Cromwell 74:

$ java -Dconfig.file=PAPIv2.conf -jar cromwell-74.jar run main.wdl

will succeed.

When run on Google Cloud using Cromwell 75:

$ java -Dconfig.file=PAPIv2.conf -jar cromwell-75.jar run main.wdl

the workflow will fail with message:

GCS output file not found: gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/d

However, the directory is correctly delocalized:

$ gsutil ls -l gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/d
         0  2022-02-13T00:00:00Z  gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/d/x
         0  2022-02-13T00:00:00Z  gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/d/y
TOTAL: 2 objects, 0 bytes (0 B)

The delocalization script is aware that d is directory:

$ gsutil cat gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/gcs_delocalization.sh
source '/cromwell_root/gcs_transfer.sh'

timestamped_message 'Delocalization script execution started...'

# xxx
delocalize_6c578056c74a8d9a80724855ddac131c=(
  "mccarroll-mocha"       # project
  "3"   # max attempts
  "150M" # parallel composite upload threshold, will not be used for directory types
  "file"
  "gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/memory_retry_rc"
  "/cromwell_root/memory_retry_rc"
  "optional"
  "text/plain; charset=UTF-8"
  "file"
  "gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/rc"
  "/cromwell_root/rc"
  "required"
  "text/plain; charset=UTF-8"
  "file"
  "gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/monitoring.log"
  "/cromwell_root/monitoring.log"
  "required"
  "text/plain; charset=UTF-8"
  "file"
  "gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/stdout"
  "/cromwell_root/stdout"
  "required"
  "text/plain; charset=UTF-8"
  "file"
  "gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/stderr"
  "/cromwell_root/stderr"
  "required"
  "text/plain; charset=UTF-8"
  "directory"
  "gs://xxx/cromwell-executions/main/01234567-89ab-cdef-0123-456789abcdef/call-main/d"
  "/cromwell_root/d"
  "required"
  ""
)

delocalize "${delocalize_6c578056c74a8d9a80724855ddac131c[@]}"

timestamped_message 'Delocalization script execution complete.'

But somehow a new check was included in Cromwell 75 that wants d to be a file even if it is delocalized as a directory.

This breaks the only workaround available in Cromwell to be able to delocalize a list of files not determined a priori before the start of the task. Notice that glob() is not an acceptable alternative as glob() does not provide control over the order of the output files.

lmtani commented 2 years ago

Hello,

I think the problem is solved in release 78 of Cromwell. I had this problem when running the mocha workflow at Cromwell server 74. After updating to 78 the workflow completed the problematic tasks.

+--------------------+---------+------------+---------------------+
|        TASK        | ATTEMPT |  ELAPSED   |       STATUS        |
+--------------------+---------+------------+---------------------+
| batch_id_lines     | 1       | 5m34.003s  | Done                |
| batch_sorted_tsv   | 1       | 4m45.648s  | Done                |
| csv2bam (Scatter)  | -       | 10m51.838s | 1/1 Done | 0 Failed |
| green_idat_lines   | 1       | 5m34.003s  | Done                |
| gtc                | 1       | 5m27.897s  | Done                |
| gtc_reheader       | 1       | 5m26.257s  | Failed              |
| idat               | 1       | 5m27.897s  | Done                |
| idat2gtc (Scatter) | -       | 10m58.206s | 0/1 Done | 1 Failed |
| red_idat_lines     | 1       | 5m34.002s  | Done                |
| ref_scatter        | 1       | 4m39.394s  | Done                |
| sample_id_lines    | 1       | 5m34.003s  | Done                |
| sample_sorted_tsv  | 1       | 4m42.453s  | Done                |
+--------------------+---------+------------+---------------------+
❗You have 1 issue:

 - Workflow failed
 - GCS output file not found: gs://bioinfo-dev-temp/mocha/a224bb3e-fc20-4b0a-8846-ee2b4b603933/call-gtc_reheader/maps
 - GCS output file not found: gs://bioinfo-dev-temp/mocha/a224bb3e-fc20-4b0a-8846-ee2b4b603933/call-idat2gtc/shard-0/gtcs
+----------------------------+---------+-----------------+-----------------------+
|            TASK            | ATTEMPT |     ELAPSED     |        STATUS         |
+----------------------------+---------+-----------------+-----------------------+
| batch_id_lines             | 1       | 16.37s          | Done                  |
| batch_sorted_tsv           | 1       | 15.288s         | Done                  |
| call_rate_lines            | 1       | 5m34.525s       | Done                  |
| computed_gender_lines      | 1       | 5m34.523s       | Done                  |
| csv2bam (Scatter)          | -       | 49.958s         | 1/1 Done | 0 Failed   |
| flatten_sample_id_lines    | 1       | 5m29.56s        | Done                  |
| get_max_nrecords (Scatter) | -       | 5m32.076s       | 1/1 Done | 0 Failed   |
| green_idat_lines           | 1       | 16.38s          | Done                  |
| green_idat_tsv             | 1       | 5m33.602s       | Done                  |
| gtc                        | 1       | 10.602s         | Done                  |
| gtc2vcf (Scatter)          | -       | 8m15.392s       | 1/1 Done | 0 Failed   |
| gtc_reheader               | 1       | 4m16.907s       | Done                  |
| gtc_tsv                    | 1       | 5m30.578s       | Done                  |
| idat                       | 1       | 7.606s          | Done                  |
| idat2gtc (Scatter)         | -       | 9m46.928s       | 1/1 Done | 0 Failed   |
| mocha_calls_tsv            | 1       | 5m19.305941005s | Running               |
| mocha_stats_tsv            | 1       | 5m19.304938136s | Running               |
| red_idat_lines             | 1       | 16.386s         | Done                  |
| red_idat_tsv               | 1       | 5m33.603s       | Done                  |
| ref_scatter                | 1       | 17.728s         | Done                  |
| sample_id_lines            | 1       | 16.383s         | Done                  |
| sample_id_split_tsv        | 1       | 5m31.462s       | Done                  |
| sample_sorted_tsv          | 1       | 11.924s         | Done                  |
| sample_tsv                 | 1       | 5m26.14s        | Done                  |
| vcf_concat (Scatter)       | -       | 5m32.467s       | 1/1 Done | 0 Failed   |
| vcf_import (Scatter)       | -       | 8m16.609s       | 1/1 Done | 0 Failed   |
| vcf_merge (Scatter)        | -       | 2h6m53.926s     | 23/23 Done | 0 Failed |
| vcf_mocha (Scatter)        | -       | 8m19.96s        | 1/1 Done | 0 Failed   |
| vcf_phase (Scatter)        | -       | 3h7m39.033s     | 23/23 Done | 0 Failed |
| vcf_qc (Scatter)           | -       | 2h8m6.051s      | 23/23 Done | 0 Failed |
| vcf_scatter (Scatter)      | -       | 5m25.444s       | 1/1 Done | 0 Failed   |
| vcf_split (Scatter)        | -       | 2h7m37.183s     | 23/23 Done | 0 Failed |
| write_tsv                  | 1       | 5m10.124926865s | Running               |
| xcl_vcf_concat             | 1       | 5m28.883s       | Done                  |
+----------------------------+---------+-----------------+-----------------------+

note: some tasks has duration of few seconds because I'm using call cache.