Closed mr-c closed 7 years ago
A braindump from @brainstorm and I :-)
We've narrowed down the problem: the job object made available to CWL expressions during the evaluation of any ResourceRequirement
for a step in a CWL Workflow
is in fact the job object (a.k.a the job object) of the CWL Workflow
itself, and not the inputs for this step.
What we don't know how to do is to create/find the correct job object -- if we had it, we would make sure the builder on https://github.com/BD2KGenomics/toil/blob/7b1f22ecd3f8aea6d0a86ab56344d7d80bade4ab/src/toil/cwl/cwltoil.py#L247 was created using it
@tetron Any suggestions?
Our plan:
cwltool
does with respect to parsing a ResourceRequirement
and compare to the current Toil codepath(s).cwltoil
generate a valid CWL input job object without retrieving/copying files from Toil's jobstore inclusive of the size attribute for File
objects using AbstractJobStore.getSize()
size
attribute on File
objects is required for a particular CWL description (for example cwltool:FileSizeRequiredRequirement
). This extension will be implemented as a namespaced and flag protected feature for cwltool
so that Toil can know about it as well.cwltoil
needs to know the resource requirements when building Toil's Job
graph, though no jobs have run yet so we don't have any information on the outputs of the previous run.
See https://github.com/BD2KGenomics/toil/blob/7b1f22ecd3f8aea6d0a86ab56344d7d80bade4ab/src/toil/cwl/cwltoil.py#L264 https://github.com/BD2KGenomics/toil/blob/7b1f22ecd3f8aea6d0a86ab56344d7d80bade4ab/src/toil/job.py#L272 https://github.com/BD2KGenomics/toil/blob/7b1f22ecd3f8aea6d0a86ab56344d7d80bade4ab/src/toil/job.py#L56
Now our question is: for any given job, can we inject "fresher" resource requirements into the Toil Job object after the time the ancestor jobs are finished but before those resource requirements are used to schedule/reserve compute?
See https://github.com/BD2KGenomics/toil/pull/1810 for a first pass of this (!!!)
as noted in #1638
This is a follow on to #1540 #1621