Open katevoss opened 7 years ago
@vdauwera Can you explain the situation? I'm not clear what the exact feature request is. It won't make it into this release but we can see about next (Cromwell 28).
The tasks we run tend to have variable memory and storage requirements depending on the dataset we're processing in any given run. It would be nice to be able to set just minimum values and have Cromwell calculate what it should actually request based on "some logic" relating to the size of the input -> where the "some logic" is the difficult bit of course. For some tasks we have pretty good expectations of how the needs will relate to inputs, eg if I'm just copying over the same data with minor changes, but for others it could be hairy.
Frankly I don't think this should be made a priority, because my naive impression is that it will be really hard to do well, and the result will be a convenience, but nothing earth-shattering. There's a lot of other stuff I would want to have first.
We can already accept expressions in the runtime attributes so this is definitely feasible.
A size_of(file): Int
method would be easy to implement and give us this feature.
It sounded like they wanted something more automagical but yeah
A suggestion I have would be to get people using expressions for memory and disk size. I believe we already have a size expression for files, don't we? Then learn what people are doing commonly and make that easy!
Magic is hard to do well -- discovering some common patterns and making that is is... easier
Kristian Cibulskis Engineering Director, Data Sciences & Data Engineering Broad Institute of MIT and Harvard kcibul@broadinstitute.org
On Sat, May 13, 2017 at 1:34 PM, Jeff Gentry notifications@github.com wrote:
It sounded like they wanted something more automagical but yeah
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/cromwell/issues/2267#issuecomment-301262719, or mute the thread https://github.com/notifications/unsubscribe-auth/ABW4gw1uBQmH-POG0YZKp4XLsuf8p_V9ks5r5em6gaJpZM4NZ55B .
I think we all (outside of perhaps the original request) are in vehement agreement
We can already accept expressions in the runtime attributes so this is definitely feasible. A size_of(file): Int method would be easy to implement and give us this feature.
This already exists and is called size
.
This looks more like a requests for having an autoSize
syntactic sugar function that would be the sum of all size(inputFile)
?
WDL tasks localize during JES runs using WDL like so:
also, WDL tasks can specify disk size for attached disk to help ensure that there is sufficient disk-space for files like so (in the example a user-configurable setting):
It would seem like a good idea to have a special variable to automatic allocation of disk size of localized files. For example instead of the call being like
or
or
that maybe the variable could be like :
where "autoSize" automatically calculates sizes for files localized and adds them and I did some "+*" to add some factor from that (to have space for output files from the run) (in this case 20% of the disk for the output)
I wonder about people's thoughts on this?
This Issue was generated from your forums
@lbergelson commented on Wed Apr 05 2017
👍 It would be nice to be able to specify memory requirements as a function of input filesize as well...
It would be nice to have fine grained access to each input size as well as just the total.