ENCODE-DCC / croo

Cromwell output organizer
MIT License
13 stars 3 forks source link

Detected a cyclic link in DAG, when task inputs are written to the output map #40

Open motorny opened 3 years ago

motorny commented 3 years ago

Dear croo package developers,

Please consider the following WDL workflow:

version 1.0
workflow myWorkflow {
    input {
        String file_in= "/a/b/c"
    }
    call myTask {
        input:
         file_in = file_in
    }
}

task myTask {
    input {
        String file_in
    }
    command <<<
        cat << EOF > map.json
        {
          "key": ["~{file_in}"]
        }
        EOF
    >>>
    output {
        Map[String, Array[String]] map_out = read_json("map.json")
    }
}

In practice, the task does a log more complex work with paths and writes them to the JSON. That JSON is then read into the WDL variables and handled by scatter.

This workflow produced the following calls metadata (truncated):

  "calls": {
    "myWorkflow.myTask": [{
      "executionStatus": "Done",
      "backendStatus": "Done",
      "commandLine": "cat << EOF > map.json\n{\n  \"key\": [\"/a/b/c\"]\n}\nEOF",
      "shardIndex": -1,
      "outputs": {
        "map_out": {
          "key": ["/a/b/c"]
        }
      },
      "inputs": {
        "file_in": "/a/b/c"
      },
      "returnCode": 0,
      "end": "2021-06-17T17:58:13.092Z",
      "attempt": 1,
      "start": "2021-06-17T17:56:44.651Z"
    }]
  },

Afterwards, I try to apply croo package for copying results croo metadata-out.json --out-def-json out_def.json --out-dir ./. Out definiton could be even an empty JSON. However, the execution fails with an error

    raise ValueError('Detected a cyclic link in DAG.')

Could you please have a look into int?

P.S. This was kind of urgent for me and I looked into the code. It seems to me that an additional check here that task names are different and n1.task_name != n2.task_name does the job. I have not fully tested it, just an idea.

RGBEN commented 2 years ago

Hi, did you find a way to troubleshoot this?

leepc12 commented 2 years ago

@RGBEN: Sorry about long delay. Can you please upload a full metadata.json and also Croo output definition JSON for your WDL (if it's not an ENCODE pipeline)?

xquek commented 11 months ago

@leepc12 and Croo developers, here is a small example that i have (not the actual use case) that will reproduce this issue.

If the value of the output and the input are the same, croo will think that there is a cyclic link in the DAG.

As @motorny suggested. I have confirmed adding n1.task_name != n2.task_name will fix the issue but am not 100% sure if this will break anything else.

here is a the simple wdl and meta.json for so you could reproduce the issue.

example.zip

xquek commented 11 months ago

happy to put in a PR if that is welcomed! thanks!