Closed bouweandela closed 10 months ago
We will need at least the following:
* code to split an existing recipe into smaller sub-recipes * code to check that the ...
And code (+ data) to decide which machine should process which data ....
To find out which machine hosts which data, we could use either esgf-pyclient or a collection of intake catalogs, e.g. those hosted at https://github.com/NCAR/intake-esm-datastore/tree/master/catalogs
To find out which machine hosts which data, we could use either esgf-pyclient or a collection of intake catalogs, e.g. those hosted at https://github.com/NCAR/intake-esm-datastore/tree/master/catalogs
Using ESGF infrastructure alone seems more robust. ESFG compute logic would tend to favor sending compute tasks preferentially to those datanodes which host the original version of the data (rather than a duplicate at an index node), but only of course if they expose the compute function. This also allows to reach data which maybe is not duplicated. ANd the fallback solution would be to select a compute node whcih is network-ally close to the datanode
This seems to vague at the moment to be included in 2.4.0. @bouweandela, are you ok with bumping this to 2.5.0?
I think so. It is deliverable 9.3 of the IS-ENES3 project, which is due by the end of the year. Therefore it would have been nice if it would have been ready in time for v2.4, because then it would be in a released version. But I guess they'll just have to accept it as in main
and included in the next release. I will try, but it seems unlikely that I'll be able to completely implement this in time.
hey @bouweandela I'd like to help with this in the new year, man! :beer:
This is pretty far done, except for automatic splitting of recipes. Manual splitting is supported since #1264. I will try to do something about automatic splitting for v2.6, but I'm not sure how important this feature is.
Closing this issue as it is mostly done. Automatically splitting a recipe based on which machine hosts what data has so far not been requested by any users, so it is probably best to postpone developing such a feature until there is actual demand.
In the IS-ENES3 project, one of the goals is to allow ESMValTool to run distributed across machines attached to multiple different ESGF nodes.
An idea by @nielsdrost to accomplish this would be to split an existing recipe into multiple smaller sub-recipes (containing one or more preprocessing tasks), these parts could then be run remotely and the results combined in a final recipe running the diagnostic tasks. Running remotely could be done conveniently by using a WPS, but initially, we aim for manually running the various sub-recipes from the command line.
This issue is to collect ideas on how this could be implemented. We will need at least the following:
As part of this distributed compute task, we would also like to implement improved support for
intake-esm
(#31), if needed augmented with support for OpenDAP access (#1131) and download support usingesgf-pyclient
(#1130).