broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
988 stars 357 forks source link

Optional struct in outputs are not allowed #4111

Open ffinfo opened 6 years ago

ffinfo commented 6 years ago

When having a tool that sometimes creates a group of files or not (depending on the data sometimes even) a optional struct in the output would be nice to have.

As a user This is nice to have but as a developer I also see some issues here. For me best combination would be to to create Some(file) when all files are existing, a None when all files are not existing and still raise an error when 1 File is found but an other is not found. Of course this only count for required files in the structs, optional files inside the structs already works.

Example below does not work on version d9b9262

version 1.0

workflow Test {
  input {
  }

  call Echo as echo

  output {
    Out? bla2 = echo.s
  }
}

task Echo {
  input {
  }

  command {
    echo bla
  }

  output {
    Out? s = object {
      file: "some_none_existing_file",
      index: "some_none_existing_index"
    }
  }
}

struct Out {
  File f
}

error:

[2018-09-18 11:11:06,16] [error] WorkflowManagerActor Workflow 1beb17d5-1c5d-489b-9750-6c54d17a77de failed (during ExecutingWorkflowState): java.io.FileNotFoundException: Could not process output, file not found: /home/pjvan_thof/src/all-pipelines/cromwell-executions/Test/1beb17d5-1c5d-489b-9750-6c54d17a77de/call-echo/execution/some_none_existing_file
cjllanwarne commented 6 years ago

This feels like another ➕ for https://github.com/broadinstitute/cromwell/issues/1838

Specifically your situation feels like the "magic optional proliferation" in the second example there - and I'm not 100% sure that I like that - but I do think optional File? outputs are valuable and you could certainly come up with an in-WDL scheme to convert various File? outputs into a combined Out? output

geoffjentry commented 6 years ago

Am I misunderstanding or is this really a WDL issue and not a Cromwell issue?

ffinfo commented 6 years ago

@geoffjentry not sure here, the spec it not ruling this out but also not say explicitly it's possible.

The idea is to keep the same kind of struct for the optional and non optional outputs like this:

struct IndexedFile {
  File file
  File index
}

task Test {
  output {
    IndexedFile file1 = "some file"
    IndexedFile? file2 = "some other file"
  }
}

In this case once having a struct both field should be there. Also the other option should require 2 structs to be there, this only require 1 schema/struct.

cjllanwarne commented 6 years ago

We should double check but I think "a File? output is unfilled if the file doesn't exist" is neither explicitly stated nor explicitly disallowed in the spec. My gut feeling is "I would expect it to just work" but Cromwell doesn't support it and it's tricky on PAPIv1

I think allowing a Struct with unfilled File? values to be coerced into an unfilled Struct? would need a spec change.

tlangs commented 5 years ago

Adding onto this, we would find it very helpful if we could define any optional output at all. We have tools that produce files optionally based on input, and its a big pain point to have to branch to two separate tasks that contain almost exactly the same thing based on a value. Being able to define a File? output would be super-duper useful for our use cases.