broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
996 stars 361 forks source link

Failure to find imported WDLs using relative paths #4515

Closed tlangs closed 5 years ago

tlangs commented 5 years ago

When using relative path-based imports in a WDL, and imported WDL which also imports a WDL does not look in the right places for relative imports. For example, given the directory structure:

import_issue
|   Issue.wdl
|
a___
|   |___A.wdl
|
|___b
    |___B.wdl
    |
    |___c
        |___C.wdl

where Issue.wdl imports "a/A.wdl" as WorkflowA and "b/B.wdl" as WorkflowB and B.wdl imports import "b/c/C.wdl" as CTasks, Cromwell is able to process and run the WDL in v34, but not in v36. It fails both submission to a Cromwell server and validation in Womtool.

Attached is a zip file containing this toy example. import_issue.zip

aednichols commented 5 years ago

Have you tried having B.wdl do import "./c/C.wdl" as CTasks or import "c/C.wdl" as CTasks?

It does make some sense to me that if you're in directory b, you can no longer refer to it by the name b anymore.

tlangs commented 5 years ago

We're expecting the resolver to look at the root of the submitted zip file when resolving files. We treat all imports as full paths from the root of the dependencies file structure.

aednichols commented 5 years ago

That would make it an absolute path instead of a relative path, no?

I'm going to have to study this a bit more.

tlangs commented 5 years ago

In the documentation, it says:

To use a file-based import resource, you must provide a ZIP bundle of your resources and then use a path relative to that ZIP in your import statement. For example: import "my/wdl/sub/directory/example.wdl" as file_import2

aednichols commented 5 years ago

I think that documentation may be implicitly (and misleadingly!) assuming that the import statement is in a WDL at the root.

That said, it definitely used to work, so I will find out whether that changed deliberately or accidentally.

aednichols commented 5 years ago

FYI, I verified that both of these options successfully work around the problem

Have you tried having B.wdl do import "./c/C.wdl" as CTasks or import "c/C.wdl" as CTasks?

aednichols commented 5 years ago

Developer note: this was already broken in 35

tlangs commented 5 years ago

Ah. We skipped 35 and went straight to 36. We're back to running 34 now, so its not the highest priority. Its just a nice-to-have so we don't have to edit all of our WDLs.

geoffjentry commented 5 years ago

I have a vague recollection that this was something which was changed in response to different people having different assumptions and an executive decision was made. I could be very wrong here. @aednichols it's worth checking in with @cjllanwarne when he returns

aednichols commented 5 years ago

I'm doing a bisect between 34 and 35 (only 6 steps) and when I find the PR, we should be able to tell whether it was deliberate

geoffjentry commented 5 years ago

It's also possible that it was an accidental byproduct of the wiring to get things working with CWL and WES (both CWL & WDL) as they have a very different behavior than traditional Cromwell/WDL.

In the latter case the main WF is always placed at the root and everything stems from that, but in the former cases (and IMO a far, far, far, far more sane policy) the set of files are provided and one provides a pointer to the logical root

aednichols commented 5 years ago

It was https://github.com/broadinstitute/cromwell/pull/3916

aednichols commented 5 years ago

According to the PR description, the WDL 1.0 spec did not have an opinion here, so we let our 1.0 implementation change.

The development version of WDL says

In the event that there is no protocol the import is resolved relative to the location of the current document. If a protocol-less import starts with / it will be interpreted as starting from the root of the host in the resolved URL.

so I think it would be most pragmatic to close this issue with a documentation-update PR.

@tlangs does this sound fair to you?

tlangs commented 5 years ago

Does this mean we can put a file:// in front of all our imports and be good?

aednichols commented 5 years ago

~Hmm, wouldn't the location file:// refer to the root of the server and require you to follow it with file://where_cromwell_runs/where_we_unzipped_the_zip/b/c/C.wdl?~ Nvm, that would be file:///

In any case I tried import "file://b/c/C.wdl" and it does not work in Cromwell's WDL 1.0 implementation.

geoffjentry commented 5 years ago

IIRC we discussed explicitly disallowing file:// URLs for exactly this reason in a cromwell server. Not sure the state of that, perhaps it's failing intentionally here :)

tlangs commented 5 years ago

It would be nice if the resolver looked for paths both relative to the WDL and relative to the root of the provided ZIP imports.

geoffjentry commented 5 years ago

Ultimately the question of how to resolve imports needs to be part of the WDL spec itself. NB that's not "zip imports" but rather just if everything is relative to the root workflow document or if everything is relative to the WDL at hand. Allowing for the behavior to be user defined makes WDL documents be non-portable.

IMO it doesn't make much sense to add a switch into Cromwell to add support for the discarded angle of what was already a confusing grey area considering that the correct behavior has already been clarified for future spec versions.

Updating the WDLs in question is the correct course of action here.