MetOffice / dagrunner

⛔[EXPERIMENTAL] Directed acyclic graph (DAG) runner and tools
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

INV: Real-time resource recording, usage and feedback mechanism #5

Open cpelley opened 6 months ago

cpelley commented 6 months ago

Dask has the means to manage resources at run-time and autoscale. However it is unknown how effective/efficient this will be in a real world scenario with large workflows. To that end, we have a fall-back plan which intends to instruct dask accurately with memory usage of each processing step while maintaining total separation with the configuration/recipe it is executing.

The new framework will provide a means to instruct the python graph of each steps resource requirement (memory footprint) and also to then write its footprint within each execution step via the logging and monitoring capability (https://github.com/MetOffice/pp_systems_framework/issues/4) to a database or otherwise for more accurate estimation for execution in the next cycle. That is, a feedback mechanism which allows adjusting memory footprint requirements that can evolve to reflect the changing circumstances based on weather.

Through the use of size hints:

Implement a custom _sizeof method within the Dask collection class to provide explicit size hints for Dask to use during memory estimation. This method should return an estimate of the size of a single element in your custom collection.

def _sizeof(self, key):
    # Your custom logic to estimate the size of a single element
    return sys.getsizeof(self.data[key])

Issues

Dependencies

cpelley commented 1 month ago

Recording memory footprint of plugin execution is handled by https://github.com/MetOffice/dagrunner/issues/5 The things remaining from this issue are then this feedback mechanism referenced. That is, reading from the sqlite database and wrapping execution in objects with 'size' reflecting their likely footprint.