US-GHG-Center / veda-config-ghg

Veda config for GHG
https://ghg-demo.netlify.app
Other
3 stars 14 forks source link

CO2 Emissions by Country Experiment #389

Open aboydnw opened 1 month ago

aboydnw commented 1 month ago

Context

We would like to give users access to state-level or other regionally aggregated emissions based on the OCO2 dataset.

Users currently have the option to compute zonal statistics on the fly from total emissions datasets. However, the officially approved methodology requires a different formula, involving several partial products, which we currently do not host. In short, users can currently get the zonal median of the sum of constituent products ($\overline{ \sum{p} x }$), while the official methodology is to calculate the sum of the zonal medians of the constituent products ($\sum{p} \bar{x}$).

One option would be to host all the constituent datasets and implement a custom formula. Definitely possible, but a lot of work.

A quick experiment could be to pre-compute the estimates and display them in the GHG Center and gather responses from users who want to compute the same metric for a custom area.

Experiment

Brendan Byrne has a csv of accurate country totals already available in his Jupyter Notebook. We could provide access to this data in the GHGC Portal and determine how many people want something more than what is available in this csv file.

Potential gaps users might see with this csv file:

  1. Not able to combine countries or create irregular shapes (like regions of the world, alliances, etc)
  2. Smaller than country units, like states
  3. Not able to explore different temporal ranges or resolutions

Acceptance Criteria

Jeanne-le-Roux commented 1 month ago

This is the specific dataset referenced: https://earth.gov/ghgcenter/data-catalog/oco2-mip-co2budget-yeargrid-v1

If you were to calculate the total CO2 emission for X year over X area of interest (e.g. for a given country) using the 1x1 degree data currently provided on the GHG Center portal, you would get a result inconsistent with the country level totals provided in the related publication, because it uses a different underlying methodology. Section 3 of the publication provides some insight to how the country-level total data were derived differently than the 1x1 degree product.

It was proposed that the user have the ability to select a country mask (or any AOI) and request CO2 emissions totals with the correct underlying methodology applied in the backend. We first want to investigate whether this would bring significant user value (as outlined above) before moving forward with trying to implement something like this.

aboydnw commented 2 weeks ago

@Jeanne-le-Roux @siddharth0248 @deborahUAH What if we made this the use case for the Microsoft Geo Copilot proof of concept? It might be a little more complicated than we had hoped for, but would be very tangible

If I remember correctly, there might be a few steps to take in order to make this a reality:

  1. Ingest the underlying datasets that are used in this calculation (is that accurate?)
  2. Codify the model used in the linked publication
  3. Allow users to query these model results (how would we then visualize it?)

Might be a bit complicated to start with, but could be a good guide post

cc @xhagrg @slesaad @j08lue @freitagb

j08lue commented 2 weeks ago

I updated the formulations in the issue description above ☝️to clarify that the information in the GHG Center is accurate, just not sufficient for calculating accurate regional totals (requires sum of medians, not median of sum).

j08lue commented 2 weeks ago

If I remember correctly, there might be a few steps to take in order to make this a reality

@aboydnw, the steps you listed are correct. But this is not a case for machine learning / LLM, etc.

Data-analysis-wise, the problem is super simple - just some arithmetics.

It would also be super easy to build a custom service that makes this calculation.

The question is whether it makes sense, strategically, to maintain a custom analysis service for one particular dataset or type of calculation (maybe it can be somewhat generalized).

j08lue commented 2 weeks ago

If stakeholders agree, I think it would be great to start with the experiment you outlined - to use pre-computed totals and find out how to display them in the UI, then see whether users seek to calculate totals for other areas. Could even be integrated into E&A, if need be, (via a dedicated area presets overlay, similar to Global Forest Watch, cc @faustoperez).

On the other hand, calculating accurate totals of model data products is currently not supported well, since we do not support scaling by grid cell / pixel area (to get from flux/area to total flux). This is a gap in the statistics services we have, which are mainly built for Earth observation products.

A new service that has some special functionality for global, gridded, lat-lon projected products including accurate totals of composite products might add a lot of value to the GHG Center analysis functionality.

Perhaps start with the first option, while we work on the second? Happy to discuss this further.

aboydnw commented 2 weeks ago

I'm game for starting with the first experiment. In that case, we would need the csv or another way to display the summarized data

Jeanne-le-Roux commented 2 weeks ago

Country totals can be accessed here (CSV, XLS or NetCDF format): https://ceos.org/gst/carbon-dioxide.html

In the exploration environment we're currently only displaying the following variables (there's many more in the CSV):

If we wanted to further down-select the number of variables to display for the country totals I would defer to the data providers to ask for their preference.

j08lue commented 1 week ago

From data to algorithms to applications - the increase in (user) value we are targeting here is quite in line with the principle that Joe Morrison nicely expressed a few years back.