EsperanzaCuartero commented 3 years ago

Challenge 22- ML4Land

Stream 2 - Machine Learning for weather, climate and atmosphere applications

Goal

Improve understanding of land surface cover characteristics and how these map into reanalysis variables such as surface temperature, using climate reanalysis such as ERA5 and ad-hoc exploratory 1km simulations.

Mentors and skills

Mentors: @dueben @gpbalsamo @joemcnorton
Skills required:
- Previous experience with high-resolution land surface image (Copernicus Sentinel-3 & similar Satellites platforms) related to physical properties of the land surface (Snow-cover, Vegetation-cover, Urban-cover) that can be spotted from satellite would be advantageous.
- Knowledge of some Machine Learning software (PyTorch and similar) and Machine Learning tools would also be an advantage.

Challenge description

Improve understanding of land surface cover:

How medium-resolution modelling products such as ERA5 (31km or 1/4 degree) compare with aggregate satellite images for snow, vegetation and urban cover?
About snow cover at 1km: how models and EO satellite images data differ?
About urban cover at 1km: how models and EO satellite images compare at 1km?

ESoWC

Het-Shah commented 3 years ago

Hello, I am Het Shah, a final year Computer Science Undergrad, specializing in Machine Learning. I would like to contribute to this project. I have previously worked on segmenting out vegetation cover (horticulture) from satellite images (CARTOSAT-1), using a deep learning-based model.

Could you give a small description of the data and elaborate on the challenge description so that we can start with the basic analysis.

Thanks!

EsperanzaCuartero commented 3 years ago

Dear Het Shah, many thanks for your interest. ML4Land's mentors will be in touch as soon as possible.

gpbalsamo commented 3 years ago

Dear Het Shah, We are thinking about Copernicus Sentinel-3 (or NASA-MODIS) data in the range of resolutions 250 m to 1km for the surface temperatures & for snow cover. For urban cover & water cover data there are global datasets at higher resolutions (e.g. check the JRC global dataset for the Human Settlement Layer & the Surface Water Explorer) that we process at 1km. All global model simulations at 1km (about 43200x21600 grid points for a latlon regular) and coarser spatial resolutions (down to about 25km global, about 1440x720 grid points for a latlon regular), will target a summer & winter month (e.g. January & July 2020) to detect patterns typical of the season. Hope this info helps the evaluation.

jwagemann commented 3 years ago

Hi, join us for the ECMWF Summer of Weather Code Ask Me Anything sessions and learn all things ESoWC.

When:

17 March 2021 at 4 pm GMT and
24 March 2021 at 4 pm GMT

What:

learn everything about ESoWC - how it works, the challenges this year, some tips for your proposal and listen to ESoWC experiences from previous participants

How: register here.

TanmayKhot commented 3 years ago

Hello, do we need to work on the task from scratch or build on top of some existing model? Also, could you please provide some information or examples of the expected outcome? Thank you!

joemcnorton commented 3 years ago

Hi @TanmayKhot, Many thanks for your interest. We have recently enhanced the resolution of our land-surface model to 1 km and this model exists. We can provide model output variables which would then be used in either an original or existing optimisation framework to best fit an observed variable (e.g. Tskin). The framework for optimisation would be up to you but we can suggest CliMetLab as a possible start point and you could use the relevant Python/Julia/Jupyter code, based on your preference. We can also provide the current model output of reanalysis variables (e.g. Tskin) for comparison. All of these can be packaged into a palatable format, such as NetCDF. A practical example for the work to be carried out could for example be a deep learning tool to map from satellite to modelled skin Temperature on a structure grid at very high resolution. This tool could then be used to find areas where the observations and models disagree.

jwagemann commented 3 years ago

Hi, join us for the ECMWF Summer of Weather Code Ask Me Anything session and learn all things ESoWC.

When: Wednesday, 24 March 2021 at 4 pm GMT

What: learn everything about ESoWC - how it works, the challenges this year, some tips for your proposal and listen to ESoWC experiences from previous participants

How: register here.

avishreekh commented 3 years ago

Hi! I am Avishree Khare and @Het-Shah and I would love to work on this project.

Hi @TanmayKhot, Many thanks for your interest. We have recently enhanced the resolution of our land-surface model to 1 km and this model exists. We can provide model output variables which would then be used in either an original or existing optimisation framework to best fit an observed variable (e.g. Tskin). The framework for optimisation would be up to you but we can suggest CliMetLab as a possible start point and you could use the relevant Python/Julia/Jupyter code, based on your preference. We can also provide the current model output of reanalysis variables (e.g. Tskin) for comparison. All of these can be packaged into a palatable format, such as NetCDF. A practical example for the work to be carried out could for example be a deep learning tool to map from satellite to modelled skin Temperature on a structure grid at very high resolution. This tool could then be used to find areas where the observations and models disagree.

Thank you for such a detailed explanation. We had a couple of doubts in this:

When you mention "modelled skin temperature", are we talking about the temperature from ERA-5, the ground truth value or the temperature obtained from the optimisation model?
If we understand this correctly, the comparison needs to be made between the outputs from the proposed deep learning model and the ERA-5 values. Please correct us if this is wrong.
Is there a project (from ESOWC in previous years maybe) along the lines of this one that we could refer to in order to understand the problem statement better?

Thank you!

joemcnorton commented 3 years ago

Hi @avishreekh, Many thanks for your interest.

I will try and answer your questions, but if anything is unclear please say.

When you mention "modelled skin temperature", are we talking about the temperature from ERA-5, the ground truth value or the temperature obtained from the optimisation model?

Initially the intention would be to use the modelled ERA-5 reanlysis product, which contains variables such as skin temperature. We would like to compare these model variables with satellite observations. The ERA-5 product is in a very similar format to observation datasets, making it a suitable candidate for model evaluation. However, it will only be available at coarse resolution and may have errors resulting from the re-analysis step.

There would be an optional extension to a higher-resolution model output, where more detailed features, such as urban environments could be analysed. However, the product is not suitably formated so would require more work.

If we understand this correctly, the comparison needs to be made between the outputs from the proposed deep learning model and the ERA-5 values. Please correct us if this is wrong.

We want to learn a mapping between the model outputs from a conventional high-resolution land surface model and observations using deep learning. As input to the machine learning process you would use variables from ERA-5, and these would be trained based on satellite observations. An intention is to train the model to better represent surface processes and if possible highlight any areas where current model processes are lacking or could be improved.

Is there a project (from ESOWC in previous years maybe) along the lines of this one that we could refer to in order to understand the problem statement better?

There have been machine learning projects in previous ESOWCs, but all of them are slightly different. The closest you get in terms of a comparison are probably down-scaling papers using deep learning. See for example: Deep learning for post-processing ensemble weather forecasts | Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences (royalsocietypublishing.org). However, we would not aim for a machine learning solution of a similar complexity in this project. Perhaps a more suitable case example would be the FLUXCOM prodcut ( https://www.nature.com/articles/s41597-019-0076-8 ), again such complexity may not be required for the purpose of this project.

I hope this all makes sense, but please message again if you have any more questions.

All the best, and good luck with the proposal!

avishreekh commented 3 years ago

Thank you for the clarification @joemcnorton. This definitely helped us understand the problem statement better.

msa856 commented 3 years ago

Hello Mentors, @benattix and @carstonhernke and I are eager to work on this project.

We were curious if and how cloud cover is factored into reanalysis data. We also see a variety of reanalysis datasets, is there one we should focus on? Finally, is it up to us to determine whether to compare surface-level reanalysis data? Or up through the 80km height the ERA5 data provides?

Thank you for your time!

dueben commented 3 years ago

Great that you are interested! Here are the answers: We were curious if and how cloud cover is factored into reanalysis data. Cloud cover is generated as a diagnostic field within the model that is used to generate the reanalysis. We will focus on ERA5 and on clear-sky (so cloud-cover can be useful to isolated the clear-sky days). EO satellite remote sensing data for surface temperature is also available on clear-sky days from infra-red sensors and these data will be usable for comparison.

We also see a variety of reanalysis datasets, is there one we should focus on? We would recommend to start with ERA5 as it is using the same underlying model. Anyway, we can provide detailed input once the project has started.

Finally, is it up to us to determine whether to compare surface-level reanalysis data? Or up through the 80km height the ERA5 data provides? While we lean towards ERA5 for the start of the project, both would be interesting. Again, we could discuss in detail, once the project has started.

We hope this helps. Please let us know if you have further questions.

ECMWFCode4Earth / challenges_2021