ECMWFCode4Earth / challenges_2023

Discover the ECMWF Code for Earth 2023 challenges
52 stars 5 forks source link

Challenge 11 - Atmospheric Composition Dataset Explorer #2

Open EsperanzaCuartero opened 1 year ago

EsperanzaCuartero commented 1 year ago

Challenge 11 - Atmospheric Composition Dataset Explorer

Stream 1 - Software Development for Earth Sciences

Goal

Develop an application which will be capable of creating atmospheric composition diagnostics plots on-demand. The minimum outcome would be an application which is able to generate some of the plots in the table below. A more ambitious target is to develop a generic framework which would allow rapid prototyping of new products. Such a system would comprise data selection, post-processing, aggregation and visualization elements. We have some ideas on how to build such an application (see Skills required) but we invite candidates to propose their own ideas on the implementation details.

Mentors and skills


Note: Only nationals from European Union (EU) Member States and countries associated with EU’s Space Programme (currently Iceland and Norway) are eligible to participate (see Terms and Conditions).


Challenge description

Based on the developments and the experiences gained during the last year's ESoWC project called Wildfire Emission Explorer. The aim of the project was to create an application which allows the creation of wildfire emission plots on demand.

You can watch the final presentation here (skip to 8:50 if you would just like to see the demo)

The project code is here.

Now, we would like to extend the same idea to other CAMS atmospheric composition datasets, primarily to CAMS global greenhouse gas fluxes dataset and CAMS atmospheric composition reanalysis which are both available from the Atmosphere Data Store (ADS): https://ads.atmosphere.copernicus.eu](https://ads.atmosphere.copernicus.eu/#!/home)

The data access method and data format will be different compared to last year's project, but some plots that we would like to create are similar.

Expected outcomes

Examples of current plots

The aim of this project is to create an application which would simplify and speed-up creation of various atmospheric composition diagnostics plots based on a subset of a dataset.

Plot example Dataset Processing steps
C3S_indicators_GHG_fluxes_Fig4_apr22_branded Annual CO2 flux (MtCO2/year) from the ‘agriculture, forestry and other land use’ (AFOLU) sector in ten large parties to the United Nations Framework Convention on Climate Change (UNFCCC), estimated by two CAMS inversions: in-situ-driven (blue) and satellite-driven (orange), with uncertainty[2] for each flux (light shading). Note that the scale of the y-axis varies by party. Positive values indicate that the party is a source and negative values indicate that the party is a sink for CO2. Data source: CAMS greenhouse gas flux data. Credit: CAMS/ECMWF/LSCE cams-global-greenhouse-gas-inversion 1) Select the target countries and the inversion types to visualize 2) Retrieve the global CAMS inversion data 3) Select the fraction of pixels corresponding to the managed lands of the target countries aggregate the CAMS values within each country and compute the annual totals 4) Get the associated time-varying uncertainty from a separate database 5) Plot the time series 6) Option to superimpose the time series of the official national reports (OECD countries only) or the fossil fuel emissions
CAMS_tcno2_ts image-2023-2-14_19-17-42 cams-global-reanalysis-eac4 or cams-global-reanalysis-eac4-monthly 1) Calculate monthly means (or monthly mean anomalies for a reference period) 2) Alternative, extract daily data 3) Plot timeseries of data over a selected geographical region 4) This should be possible for surface fields, total column fields or fields on pressure levels 5) Option to superimpose curves of several species in one plot (e.g. different aerosol species)
cams_hovmoeller_o3 cams-global-reanalysis-eac4 or cams-global-reanalysis-eac4-monthly 1) Vertical hovmoeller plots of values or anomalies for selected reference period 2) Should work for daily data or monthly means 3) Download pressure level data for selected area/country and period 4) Plot vertical hovmoeller plots of values or anomalies
cams_lat_time_o3 Lat-time or lon-time hovmoeller plots of values or anomalies for selected reference period cams-global-reanalysis-eac4 or cams-global-reanalysis-eac4-monthly 1) Select type of hovmoeller plot 2) Download data for selected area/country and period 3) This could be total column, surface or pressure level data 4) Should work of daily data or monthly means 5) Produce plots of values or anomalies
elisaliv commented 1 year ago

Hi! I'm drafting a proposal for this challenge. Thinking about prioritization, would you say it is more important to develop optimal data cache (as listed in the expected outcomes) or to make the framework as generic as possible to also use it with other data products (the "more ambitious target" described in the Goal introduction)? Thanks in advance.

miha-at-ecmwf commented 1 year ago

Hello @elisaliv and thank you for your interest in this challenge.

My advice would be to play to your strengths. Unless you already have some experience on caching multidimensional and heterogeneous datasets, it's maybe better to focus on making the framework generic.

Looking forward to your proposal.

elisaliv commented 1 year ago

Hi @miha-at-ecmwf, thank you for your reply!

@luigibrancati and I have another question: what do you mean more precisely by "making the framework generic to prototype new data products"?

Here are some ideas we had: 1) Having a high-level APIs for filtering, post-processing and aggregation of generic time-series weather data, to apply the most used data transformations 2) Having a high-level APIs to generate 'standard' weather reports within provided time and spatial ranges

I guess step 1 is necessary to also develop step 2. Is that correct? And is this what you'd like to achieve with a generic framework?

Thank you again.

timometz commented 1 year ago

Hi,

I have 3 questions regarding Challenge 11:

  1. Should we re-use the code of the GUI/API developed last year as much as possible? I.e. to which extend can we build on that code?
  2. In which sense should we use caching of the data? Should caching be used in a session to avoid re-downloading the data multiple times within one session, or do you plan to save the CAMS dataset with a coarser temporal or spatial resolution to be quicker downloaded?

Similar to elisaliv's question:

  1. What does “New product” refer to in the goal description? Does it mean that the code should be easily adaptable for new datasets other than CAMS, or does it refer to higher flexibility in the creation of new plot types as wished by a user?

Looking forward to your answer and thank you in advance!

best Timo

miha-at-ecmwf commented 1 year ago

Hi @elisaliv, @luigibrancati,

The idea is to make the building blocks (GUI, data retrieval, data homogenization, data slicing and sub-setting, aggregation, visualization of results ...) of the application as modular as possible with clean interfaces between them.

So if we need to use a new dataset in the future, we just have to write new data acquisition and (potentially) data homogenization code. If we wanted a new plot type, statistical methods and visualization code would have to be updated ...

If you want to see additional examples of the types of plots we regularly create, please look at the CAMS validation reports, this is the latest one: https://atmosphere.copernicus.eu/sites/default/files/publications/32_CAMS2_82_2022SC1_D82.1.1.5-SON2022.pdf

For even more inspiration (with source code!), check the Climate Data Store applications' collection: https://cds.climate.copernicus.eu/cdsapp#!/search?type=application

Miha

miha-at-ecmwf commented 1 year ago

Dear @timometz,

Thank you for your questions.

  1. You don't have to reuse the code. However, the last year's solution should give you a basic understanding about what we are hoping to achieve with this challenge.
  2. It's a bit of an open topic, we welcome your ideas how would you make the applications as responsive as possible. We will make a development server available for the duration of the project, so the benefits of a global server-side cache could be explored.
  3. We will limit ourselves to CAMS data. The application should be flexible enough that adding a new dataset or a new plot type (i.e. not listed in the table of plot examples), would not be too complicated. See also my last answer to @elisaliv.

Miha

luigibrancati commented 1 year ago

Hello @miha-at-ecmwf, what's the deadline for the proposal? I see 12 April, but not time and timezone specified

trakasa commented 1 year ago

@luigibrancati submission deadline is 12 April 2023 (23:59 UTC). It's a bit hidden in the T&Cs Article 4 - https://codeforearth.ecmwf.int/terms-and-conditions