Open RubenRT7 opened 4 months ago
Hello, we are interested in this challenge and have a few questions:
Hi @RonT23 , Thank you for your interest and questions about our challenge! Regarding questions 1 and 3: I'm checking with my fellow mentors about this and will let you know ASAP. Regarding question 2: We’re looking for proposals with clear steps and milestones rather than abstract solutions. A more detailed plan will help ensure a tangible outcome by the end of the challenge. We are happy with some flexibility, as there will inevitably be some unexpected issues along the way, and some trial and error with some methodological issues. But within that flexibility, the more specifics you can provide, the better.
Hi @RonT23 I can now confirm that we will be able to provide you with daily river flow observations at GRDC sites, and catchment averaged ERA5 precipitation for these sites. I haven't checked the data, but I think we have a few thousands sites across the world, with variable length of record. We will prepare the data before the start of the challenge. We don't have any remote sensing data though. Therefore, if you are planning to use these in your project, this would need to be sourced by yourselves.
I could prepare some sample data for a few of these sites for you to explore, but it would take me a couple of days. Could you please confirm you would like me to prepare the sample data for you?
Best wishes,
Maliko
Hi Maliko,
It would be very helpful for use if you could prepare that data.
Thank you, Daniel
Hi @ecMaliko, We would appriciate it if you can provide us with a sample dataset! Thank you, Ronaldo T.
Hi @RonT23 and @daniel-obrien , I have attached here some sample data (100 stations) for you to explore the format and type of data that will be provided. The full dataset will have a few thousands stations. There is one netcdf file with observed discharge data, and another one with catchment averaged precipitation data from ERA5. In addition, I have also included a csv file with some additional metadata. The 'statid' variable in the netcdf files corresponds to the 'station_id_num' column in the metadata file. Please note that the reference date in the precipitation file is different from the discharge file! Let me know if you have any questions. Regards, Maliko
Thank you, that is really helpfull! R. T.
Dear @ecMaliko 1) Are there going to be any gauge stations that belong to the same catchment (water basin)? 2) Is the average ERA5 precipitation derived from the same catchment? 3) Can we use the distributed ERA5 precipitation data? 4) Will we have the distinction between rainfall and snow and hail? 5) Can we add more input data that affect the precipitation-streamflow relationship in our models?
Thanks in advance.
K. P.
Dear @KonstantinosPl ,
Thank you for your interest in this challenge!
These are the answers to your question:
While all the data you mention would surely contribute to improve the final product, don’t forget that the challenge is only 4 months. Therefore, make sure your proposed work is realistic within that timeframe.
Let me know if you have further questions!
Maliko
Hi @ecMaliko,
I have some questions following this discussion:
Thank you! Hieu
Hi @danghieutrung ,
My colleagues are on Easter break, so I will reply to the best of my knowledge, and I will get back to you with updated information as soon as I hear from them.
Maliko
Hello, we are interested in this challenge, and I have a question:
Dear @BargavReddyM ,
Thank you for your interest in this challenge. Unfortunately, for this year’s Code4Earth challenges, the call is only open to candidates who are citizens from ECMWF Member States and Co-operating States. You can find the list here: https://www.ecmwf.int/en/about/who-we-are/member-states We wish we could be more open, but this is restricted by the conditions set by our funders. Regarding the second question: we are more interested in the methodology developed rather than the specific area used to develop the method. Therefore, it doesn’t necessarily need to be based in Europe. However, Europe is one of the most data-rich area (in terms of river flow), and therefore it could be a good starting point.
Kind regards,
Maliko
Thank you for the reply
Hi @danghieutrung, hi @ecMaliko
- I think you would have access to GPUs, but I am not 100% sure. Let me come back on this point once I manage to talk to my colleagues.
AT: Yes, that is correct. Thanks @ecMaliko for answering! If the selected proposals need access to computing resources you can access the European Weather Cloud or WEkEO.
Bye, Athina
@BargavReddyM @ecMaliko
Thank you for the reply AT: Indeed, as funding comes from different (European) sources, we have to follow certain rules for eligibility. You have to be citizen or resident of an ECMWF Member State or Co-operating State or EU Member State, or from a country associated with EU’s Space Programme (currently Iceland, Norway and United Kingdom) and countries associated with EU’s Digital Europe Programme (currently Albania, Iceland, Lichtenstein, Montenegro, North Macedonia, Norway, Serbia and Türkiye).
For more details please check the Code for Earth Terms & Conditions (mainly Article 3).
Thanks @ecMaliko for getting back to Bargav!
Bye, Athina
Hello. I could not submit my proposal because the link to submit the form said refused to connect. May I have some help please ? Here is the link from the website. https://codeforearth.commpla.com/ecmwf-code-for-earth-2024-submission-form
Thank you, the link is now okay.
Hi @wsyip85 I am glad the problem is now solved. Kind regards, Maliko
Challenge 20 - Bridge the Gap: Bridging Gaps in Streamflow Observations with ML-driven Solutions
Goal
Develop machine learning solutions to bridge gaps in streamflow observations, enhancing the accuracy and reliability of hydrological data analysis and forecasting.
Mentors and skills
Essential:
Python (numpy, pandas, xarray, ...)
Machine learning (scikit-learn, Pytorch/Tensorflow)
Visualisation (mapping, graphs)
Desirable:
Time series analysis
Open-source collaboration (Git)
Advantageous:
Ability to create clear documentation / communication
Basic hydrology understanding
Challenge description
Introduction Operational flood forecasting systems like EFAS and GloFAS, part of the Copernicus Emergency Management Service (CEMS), play a pivotal role in providing advanced warnings for devastating flood events, significantly impacting societies worldwide. These systems must be reliable and accurate, making the assessment of forecast skill a critical aspect in gauging their trustworthiness and utility. A major limitation in calibrating and evaluating these forecasting systems is the scarcity, quality, and incompleteness of observational data, particularly in areas where flood impacts are most severe. In addition, the calculation of some forecasting skill scores such as the Continuous Ranked Probability Skill Score (CRPSS) necessitates continuous time series, posing a challenge when data is unavailable or incomplete. Extending the time series also allows for the provision of reference or climatology values against which to compare forecasts, enhancing the robustness of the evaluation process. Building upon existing literature (e.g. [1,2,3]), various ML methods, such as Random Forests and LSTM models, have shown promise in gap-filling river flow data. However, a comprehensive understanding of their strengths and limitations is essential for informed implementation.
Project objectives The primary objective is to explore different approaches to gap-fill observed daily streamflow time series, comparing their performance and determining the maximum length of gap that can be reliably filled. The project aims to implement these methods into an open-source software package based on Python, providing a user-friendly solution for filling gaps in observational datasets.
Methodology
Expected outcome The project’s final outcome will be a well-documented, user-friendly Python code available on GitHub, featuring one or several gap-filling options. Accompanying this code will be information on method performance, including the maximum reliable gap size and a degradation table detailing performance with increasing gap size, which will help users to select the best method for their data.
Strech goals (optional) Ready for an extra challenge? For those eager to push their limits, we offer optional stretch goals:
References [1] Arriagada et al. (2021) [2] Dariane & Borhan (2024)
[3] Ren et al. (2022)