Open EsperanzaCuartero opened 1 year ago
Hi team – thanks for putting together this challenge. Before I prepare my proposal I had a few questions:
Many thanks for your help!
Hi @simonmoulds,
Thanks for your interest! Here are some answers to your questions:
- Do you have a specific ML architecture in mind? There are many examples of emulators using conventional ML (e.g. SVM, RF), but more recently examples of deep learning approaches (e.g. CNN, LSTM) – e.g. to emulate ParFlow.
We don’t have a fixed architecture in mind, but we suspect that a deep learning approach will be most suitable to capture the complexity and do so quickly. We encourage proposal to specify what architectures the participants think will be suitable.
- (Related to above) Is it your intention to emulate all of LISFLOODs model states/fluxes, or only a subset?
The intention is to emulate all LISFLOOD state/fluxes that are necessary to restart the model (35 in total), plus river discharge.
- Will the emulator also perform river routing, or is the intention to focus only on the grid water balance?
The goal of the emulator is to emulate the LISFLOOD model as a whole, which includes river routing.
- Will it be important to constrain the emulator with any physical laws (e.g. mass conversation)?
Mass conservation would be nice to have. Feel free to make suggestions in your proposal.
- What is the target spatial resolution? At this resolution will the model still represent subgrid heterogeneity e.g. through fractional land cover).
The target resolution is 1arcmin (~1.5 km). Subgrid heterogeneity is represented using fractions.
- Reading the challenge, it seems that having a stochastic weather generator could be valuable to increase the size of the training data. Is developing such a tool part of the challenge?
ECMWF will provide the training datasets. Augmentations to the dataset could be an interesting tool, but we would prioritise getting a model trained on the existing data to assess whether this is sufficient. If there is time we could explore a stochastic weather generator.
Don't hesitate if you have more questions!
Thanks @corentincarton - this is really helpful. I will start to prepare my submission and get in touch with any other queries as they arise.
Hi @corentincarton - just wanted to apologise for not submitting an application in the end. I was recently offered a lectureship in hydrology at the U of Edinburgh and I decided that, along with my current position at Oxford, I wouldn't have the time to do this project justice.
Did you get any other applicants? If not, and if this is something you would like to continue to pursue, I may be able to devote some time to it (albeit at a slower pace than the Code4Earth timeline). Let me know!
Best wishes, Simon
Thanks for this answer @simonmoulds, we would be happy to further discuss this! We'll contact you in private :)
Congrats for your position in Edinburgh! Corentin
Challenge 23 - FloodMule: a machine learning emulator of the LISFLOOD hydrological model
Goal
Emulate LISFLOOD to reduce significantly the running time of the model for a given configuration
Mentors and skills
Challenge description
LISFLOOD is a spatially distributed (gridded) hydrological rainfall-runoff model that can simulate the main hydrological processes occurring in a catchment. LISFLOOD explicitly considers the spatial distribution of physical properties across the catchments to provide estimates of river discharge and other hydrological variables such as snow accumulation, soil moisture, etc. Driven by meteorological forcing data (precipitation, temperature and evaporation), it calculates a complete water balance for every grid cell of the computational domain.
Running the LISFLOOD hydrological model at high resolution and global (or pan-European) scale, as will be done in the next versions of EFAS and GloFAS, becomes a challenge as the running time of the model becomes too large for an operational context. Instead of optimising the current model, which would only give incremental improvement, emulating the hydrological model using machine learning could give us orders of magnitude of improvement in terms of speedup with hopefully limited or no degradation of results.
The emulator would mimic the hydrological model for a given configuration, meaning:
This would result in a simple workflow for the emulator with the following inputs:
The emulator, as the hydrological model, would provide the following outputs:
This very well-defined problem offers a multitude of areas of exploration for training the model, as we could build a training dataset by feeding into the hydrological model any set of the initial condition and forcing and using the outputs to train the emulator. For instance, the ML training could be based on one of the following approaches:
These two approaches would already give us thousands of data points (i.e. time slices) to train the model, even millions if the stochastic approach is successful.
As a continental domain is composed of thousands of hydrological catchments, the approach could first experiment on small-size basins, then scaled up to larger basins and finally to the full EFAS or GloFAS computational domain.
The details of the implementation, such as the data flow or the ML approach and libraries, will be discussed during the project. The candidates will be provided with a utility, interfacing with the LISFLOOD hydrological model, that will generate training datasets for the ML kernels.
Training/evaluation workflow: