dnanexus-archive / dx-cwl

Import and run CWL workflows on DNAnexus (alpha)
Apache License 2.0
13 stars 6 forks source link

Retrieve input/staging disk specifications #14

Closed chapmanb closed 6 years ago

chapmanb commented 6 years ago

CWL ResourceRequirements are designed to only cover temporary space (during processing) and output space (for generated files). Input file staging is meant to be allocated by the platform. Since dx-cwl allocates without knowing input requirements this adds a dx:InputResourceRequirement to specifically allocate this.

This can also be used longer term for stages which don't need input files, like record creation steps, by setting this value to 0 so dx-cwl knows not to stage down the files.

mr-c commented 6 years ago

Question: can't the input resource requirements be inferred from the input object?

chapmanb commented 6 years ago

In dx-cwl the machine types get allocated as part of the compilation process from CWL to DNAnexus apps, so at that time they don't have access to the inputs. Allocating dynamically would be a nice improvement but a longer term fix. Right now we're just estimating inputs based on files we know (available at initial compilation time) and expected size relative to those.

mr-c commented 6 years ago

@chapmanb Thanks for the information. Sounds like this isn't a user facing extension, cool.

geetduggal commented 6 years ago

Yes, that's a good point. We can theoretically dynamically allocate instance types based on input size and that would be a nice downstream thing to do. @chapmanb sorry I didn't notice the PR before, I think it got buried in my email. Merging it now and looking into our CI. Also feel free to send me a slack about a PR as a way of making sure I attend to it sooner :)