EsperanzaCuartero commented 3 years ago

Challenge 23- Mapping Emissions of Air Pollutants

Stream 2 - Machine Learning for weather, climate and atmosphere applications

Goal

Derive suitable proxies for spatial and temporal mapping of emissions.

Mentors and skills

Mentors: @annaagustipanareda @gpbalsamo @joemcnorton @mparrington
Skills required:
- Experience of use with big data
- Experience of Machine Learning techniques

Note: Challenge is funded by Copernicus. Only nationals from the European Union and ECMWF Member States are eligible to apply (see Terms and Conditions).

Challenge description

What data/system do you plan to use? We plan to use :

Either estimates of NOx emissions (based on atmospheric inversions) or direct concentration observations from satellites.
3D atmospheric fields from the ECMWF model of variables relevant to transport and chemical loss (e.g. temperature and wind).
Any proxy data that can be linked directly to emissions (e.g. nightlight data, traffic data, etc…).

What is the current problem/limitation? To model and forecast emissions of chemical tracers in the atmosphere, a suitable estimate of emissions is required. Emission estimates from inventories are often either fixed in time or vary on a long-timescale (e.g. monthly/yearly). This fails to capture the true variability in emissions due to changes in activities (e.g. rush hour). Where proxy data are used, they are often either out of date or do not offer suitable variability.

What could be the solution? The underlying processes of emissions are wide-ranging, including differing fuel types, activity types, or even social changes in human behaviour. Several proxy datasets can be used to improve estimates of emissions and also include variables which could be optimised within an emissions model. For example, having a map of population density is likely to correlate well with emission sources. An aim of this project would be to offer suitable options for which proxy data should be used for estimating emissions.

Ideas for the implementation The system could follow the design ideas of existing fossil fuel data assimilation systems (FFDAS), but should explore novel avenues for estimating emissions by identifying datasets which correlate well with emissions. Observation data could include NO2 observations from Sentinel-5p or inversion estimates; alternatively existing inventories could be used with spatial and interannual variability. The input proxies are open to many possibilities but example start points might include population density maps, nightlight data and TomTom traffic data.

ESoWC

avishreekh commented 3 years ago

Hi, I am Avishree Khare, a final year Computer Science Undergrad. I am interested in inter-disciplinary applications of deep learning and have worked on a wide-range of projects. I believe that assimilation of data from a wide range of sources would be a challenging yet fun task. Working on a robust model to estimate emissions from this data would also be interesting to work on. I would love to contribute to this project.

I would be glad if you could shed some light on the following questions:

Does the focus of this project lie only on the creation of a diverse and robust dataset for the task at hand or does it also extend to development of concrete Deep Learning (or Machine Learning) models for estimating the emissions?
Could you please elaborate on the quantities being estimated (I believe that these are PM estimates, but please correct me if I am wrong, or if there are any other quantities being estimated).

Thank You!

joemcnorton commented 3 years ago

Hi Avishree,

Thank you for your interest in the project. It sounds like you have a good background in the skills required for this project. In regards to your questions:

The task would include producing a machine learning model for estimating the emissions. The project could be split in two. First, the collection of necessary proxy data to provide as potential input to the model (some of this data might be found to have little weight in the model and can be discarded). Second, the development of the model itself using optimised or observed fluxes as training data.
The output of the model should be surface flux estimates of NOx emissions (and possibly other chemical species) at a suitable spatial and temporal resolution.

Thanks again,

We are looking forward to your proposal or any further question you may have.

Joey McNorton

jwagemann commented 3 years ago

Hi, join us for the ECMWF Summer of Weather Code Ask Me Anything sessions and learn all things ESoWC.

When:

17 March 2021 at 4 pm GMT and
24 March 2021 at 4 pm GMT

What:

learn everything about ESoWC - how it works, the challenges this year, some tips for your proposal and listen to ESoWC experiences from previous participants

How: register here.

avishreekh commented 3 years ago

Hi Avishree,

Thank you for your interest in the project. It sounds like you have a good background in the skills required for this project. In regards to your questions:

The task would include producing a machine learning model for estimating the emissions. The project could be split in two. First, the collection of necessary proxy data to provide as potential input to the model (some of this data might be found to have little weight in the model and can be discarded). Second, the development of the model itself using optimised or observed fluxes as training data.

The output of the model should be surface flux estimates of NOx emissions (and possibly other chemical species) at a suitable spatial and temporal resolution.

Thanks again,

We are looking forward to your proposal or any further question you may have.

Joey McNorton

Thank you for the clarification!

I, along with a teammate, had started working on the proposal by reading relevant papers and looking up public datasets that might be helpful. Back then, I do not remember seeing a note regarding the project being funded by Copernicus. As a non-EU resident, this makes me ineligible to work on this project. We understand that these terms and conditions are required to be followed for the project. We would however like to know if our proposal could be accommodated if the mentors find it interesting or if we would have to give it up.

Thank you Avishree

jwagemann commented 3 years ago

Hi @avishreekh , thanks for your interest in this challenge. Any team proposal will be valid and eligible as long as one person of the team holds a EU nationality. Hope this helps.

jwagemann commented 3 years ago

Hi, join us for the ECMWF Summer of Weather Code Ask Me Anything session and learn all things ESoWC.

When: Wednesday, 24 March 2021 at 4 pm GMT

What: learn everything about ESoWC - how it works, the challenges this year, some tips for your proposal and listen to ESoWC experiences from previous participants

How: register here.

alessandr2448 commented 3 years ago

Hi all, I’m Alessandra, a European scientist based in the UK. While my background is in experimental Biology, in the last year I have also become curious about Machine Learning, Big Data Analytics and Epidemiology. In fact, six months ago I received a post-master’s specialisation degree in Machine learning in Biomedical settings. As I am interested in how pollution can affect climate changes and I fall in love with this challenge, I would like to submit a proposal. I heard about it last week and am currently drafting the project and reviewing the literature. I would like to ask whether R is a programming language (instead of Python) you would consider and, also, what is the channel you prefer to post some questions/doubts: here or by emails? Thank you, Alessandra

EsperanzaCuartero commented 3 years ago

Hi Alessandra, many thanks for your interest. Just this GitHub space is the channel where you can post any specific question related to the challenge. The mentors will respond to you as soon as possible. Best, Esperanza

crampaldo commented 3 years ago

Good morning, I am Luca Rampini, a Ph.D. student investigating Machine Learning applications to manage our Built Environment. I am interested in understanding what factors belonging to the built environment contribute to air pollution. Therefore, I am strongly interested in this challenge, and I have few questions about it: 1) Is the selection of proxy data at our discretion? For instance, could I place the focus of the data on characteristics of the built environment? 2) Does the data have to come from open, public databases, or is it possible to collect data with personal devices (e.g., CO2 data taken via Arduino device)? Finally, during the webinar, the possibility of providing a template for drafting the project was mentioned. Is this template available? Thank you for organizing these interesting challenges and for all the work behind it. Thanks!

joemcnorton commented 3 years ago

Hi all, I’m Alessandra, a European scientist based in the UK. While my background is in experimental Biology, in the last year I have also become curious about Machine Learning, Big Data Analytics and Epidemiology. In fact, six months ago I received a post-master’s specialisation degree in Machine learning in Biomedical settings. As I am interested in how pollution can affect climate changes and I fall in love with this challenge, I would like to submit a proposal. I heard about it last week and am currently drafting the project and reviewing the literature. I would like to ask whether R is a programming language (instead of Python) you would consider and, also, what is the channel you prefer to post some questions/doubts: here or by emails? Thank you, Alessandra

Hi @alessandr2448 ,

Thank you for your interest in this project. We would recommend using Python, however R would also be okay.

joemcnorton commented 3 years ago

Good morning, I am Luca Rampini, a Ph.D. student investigating Machine Learning applications to manage our Built Environment. I am interested in understanding what factors belonging to the built environment contribute to air pollution. Therefore, I am strongly interested in this challenge, and I have few questions about it:
1. Is the selection of proxy data at our discretion? For instance, could I place the focus of the data on characteristics of the built environment?

2. Does the data have to come from open, public databases, or is it possible to collect data with personal devices (e.g., CO2 data taken via Arduino device)?
   Finally, during the webinar, the possibility of providing a template for drafting the project was mentioned. Is this template available?
   Thank you for organizing these interesting challenges and for all the work behind it. Thanks!

Hi @crampaldo,

Thank you for your interest in this project. The selection of the proxy data would be at your discretion like you say, we can recommend datasets but would also encourage you to explore the options. The example of built area characteristics would be a very suitable proxy for this challenge.

The data would be expected to come from public datasets, we also have some data which can be made available to you. The real-world collection of data is not expected to form part of this project.

For the template, I will try and get back to you on this.

Thank you for your interest and we are looking forward to your proposal or any further question you may have.

Joey McNorton

FedericaCas commented 3 years ago

Hi, I am Federica Casamento, a master's student in Environmental Engineering. I have a keen interest in climate change studies and I am studying Machine Learning and Python. I have academic experience in time series analysis, extreme value analysis of rainfall and temperature, bias correction and hydrological modelling using HEC-HMS, ERA5-Land dataset, observed data. This project is very appealing to me and I would like to taste my e-skills on it. I am now consulting the references to prepare a proposal.

I have a question: Is the choice of the study area up to us? Meaning we can pick a city, a region, a nation or a continent. As the choice of the datasets, hence the data, depends on the location of the study.

Thanks so much!

joemcnorton commented 3 years ago

Hi @FedericaCas,

Thank you for your interest in the project. In general the choice of area is indeed up to you. We would like for such a system to be transferable to a global scale, as a result the proxy data selected should not be so niche that it is globally unobtainable now or in the near future. If however, you believe certain porxies are of signifcant importance which can only be obtained for a given region then we would encourage exploration in to this.

Thank you for your interest and we are looking forward to your proposal or any further question you may have.

Joey McNorton

crampaldo commented 3 years ago

Hi @FedericaCas, If you are alone and you are interested to join forces with others, feel free to contact me at my email: luca.rampini@polimi.it. Have a nice day!

vidurmithal commented 3 years ago

Hi, my name is Vidur. I am interested in this challenge, and have prior experience in working with emissions and meteorological data in Python, and in developing land-use regression models to model PM2.5 variation in urban areas, and also in machine learning. I had a couple of questions regarding this challenge that I was hoping you could answer.

Is the aim of this challenge only to model anthropogenic emissions of NOx, or also natural sources?
From my understanding of atmospheric inversion estimates of emissions, they require some a priori emissions estimates that are often emission inventories. Isn't the emission inventory essentially what we are trying to obtain using proxy data sources? If that is the case is it appropriate to use atmospheric inversion estimates as the observation data in the model?
I envision that it will not be possible to have one system / model to obtain uniform results globally, due to the difference in data availability and quality in different regions. Would it be acceptable to 'segment' the problem and work on obtaining results for a particular region, while building a framework for how it can be expanded to other regions in the future?

I hope I have been able to communicate my doubts clearly.

Thank you. ~ Vidur

joemcnorton commented 3 years ago

Hi @vidurmithal,

Thank you for your interest in this project.

Is the aim of this challenge only to model anthropogenic emissions of NOx, or also natural sources?

NOx is listed as the species in the advert however, other trace gases, such as CO2 are also very welcome. The project should focus on anthropogenic emissions.

From my understanding of atmospheric inversion estimates of emissions, they require some a priori emissions estimates that are often emission inventories. Isn't the emission inventory essentially what we are trying to obtain using proxy data sources? If that is the case is it appropriate to use atmospheric inversion estimates as the observation data in the model?

This would not be an atmospheric inversion project.
"Isn't the emission inventory essentially what we are trying to obtain using proxy data sources?" The project should attempt to quantify the spatial and temporal variability in emissions and produce a suitable emission estimate using proxies. This is in essence the compilation of an inventory, yes. "If that is the case is it appropriate to use atmospheric inversion estimates as the observation data in the model?" Yes it is appropriate, atmospheric inversion estimates contain this variability and as such can be used to train a proxy-based model. This variablity is often lacking in the prior. There are few global atmospheric source inversions for NOx so in practice for this project it is likely we would provide the atmospheric concentrations from the analysis fields of the Copernicus Atmosphere Monitoring Service or satellite observations and assume a direct correlation in space and time between the atmospheric concentration and the emissions, without using a source inversion.

I envision that it will not be possible to have one system / model to obtain uniform results globally, due to the difference in data availability and quality in different regions. Would it be acceptable to 'segment' the problem and work on obtaining results for a particular region, while building a framework for how it can be expanded to other regions in the future?

This would be a very suitable option, it should be noted that eventually such a system should be transferable to a global scale. This means regional systems are fine but the proxy data should not be so obscure that it cannot be increased in scope at a later stage.

I hope this is all clear but please let me know if you have any further questions.

Thanks

Joey McNorton

FedericaCas commented 3 years ago

Hi! Just a question about the organization of the team: Would a group of five members be OK?

Thanks, Federica Casamento

EsperanzaCuartero commented 3 years ago

Hi Federica, Yes, a group of five members is fine, as long as you organize your different tasks as a team. Best Esperanza

vidurmithal commented 3 years ago

Thanks for your response @joemcnorton.

Another question I had was whether it would be acceptable to use measurements of other pollutants / species as proxies for the pollutant we choose to monitor. For example, if we decide to estimate NO2 emissions, can we use observed data on say, CO2 as a proxy in that model.

Thank you. ~ Vidur

joemcnorton commented 3 years ago

Hi @vidurmithal,

Observations of concentrations/fluxes should not be used as input (or a proxy) to the tool, they should be used as the training output for it. This applies to NOx and CO2 as the eventual system will include both species, although this additional work is most likely beyond the scope of this project.

I hope this helps, thanks.

Joey

ECMWFCode4Earth / challenges_2021