Open aboydnw opened 1 month ago
This is the specific dataset referenced: https://earth.gov/ghgcenter/data-catalog/oco2-mip-co2budget-yeargrid-v1
If you were to calculate the total CO2 emission for X year over X area of interest (e.g. for a given country) using the 1x1 degree data currently provided on the GHG Center portal, you would get a result inconsistent with the country level totals provided in the related publication, because it uses a different underlying methodology. Section 3 of the publication provides some insight to how the country-level total data were derived differently than the 1x1 degree product.
It was proposed that the user have the ability to select a country mask (or any AOI) and request CO2 emissions totals with the correct underlying methodology applied in the backend. We first want to investigate whether this would bring significant user value (as outlined above) before moving forward with trying to implement something like this.
@Jeanne-le-Roux @siddharth0248 @deborahUAH What if we made this the use case for the Microsoft Geo Copilot proof of concept? It might be a little more complicated than we had hoped for, but would be very tangible
If I remember correctly, there might be a few steps to take in order to make this a reality:
Might be a bit complicated to start with, but could be a good guide post
cc @xhagrg @slesaad @j08lue @freitagb
I updated the formulations in the issue description above ☝️to clarify that the information in the GHG Center is accurate, just not sufficient for calculating accurate regional totals (requires sum of medians, not median of sum).
If I remember correctly, there might be a few steps to take in order to make this a reality
@aboydnw, the steps you listed are correct. But this is not a case for machine learning / LLM, etc.
Data-analysis-wise, the problem is super simple - just some arithmetics.
It would also be super easy to build a custom service that makes this calculation.
The question is whether it makes sense, strategically, to maintain a custom analysis service for one particular dataset or type of calculation (maybe it can be somewhat generalized).
If stakeholders agree, I think it would be great to start with the experiment you outlined - to use pre-computed totals and find out how to display them in the UI, then see whether users seek to calculate totals for other areas. Could even be integrated into E&A, if need be, (via a dedicated area presets overlay, similar to Global Forest Watch, cc @faustoperez).
On the other hand, calculating accurate totals of model data products is currently not supported well, since we do not support scaling by grid cell / pixel area (to get from flux/area to total flux). This is a gap in the statistics services we have, which are mainly built for Earth observation products.
A new service that has some special functionality for global, gridded, lat-lon projected products including accurate totals of composite products might add a lot of value to the GHG Center analysis functionality.
Perhaps start with the first option, while we work on the second? Happy to discuss this further.
I'm game for starting with the first experiment. In that case, we would need the csv or another way to display the summarized data
Country totals can be accessed here (CSV, XLS or NetCDF format): https://ceos.org/gst/carbon-dioxide.html
In the exploration environment we're currently only displaying the following variables (there's many more in the CSV):
If we wanted to further down-select the number of variables to display for the country totals I would defer to the data providers to ask for their preference.
From data to algorithms to applications - the increase in (user) value we are targeting here is quite in line with the principle that Joe Morrison nicely expressed a few years back.
Context
We would like to give users access to state-level or other regionally aggregated emissions based on the OCO2 dataset.
Users currently have the option to compute zonal statistics on the fly from total emissions datasets. However, the officially approved methodology requires a different formula, involving several partial products, which we currently do not host. In short, users can currently get the zonal median of the sum of constituent products ($\overline{ \sum{p} x }$), while the official methodology is to calculate the sum of the zonal medians of the constituent products ($\sum{p} \bar{x}$).
One option would be to host all the constituent datasets and implement a custom formula. Definitely possible, but a lot of work.
A quick experiment could be to pre-compute the estimates and display them in the GHG Center and gather responses from users who want to compute the same metric for a custom area.
Experiment
Brendan Byrne has a csv of accurate country totals already available in his Jupyter Notebook. We could provide access to this data in the GHGC Portal and determine how many people want something more than what is available in this csv file.
Potential gaps users might see with this csv file:
Acceptance Criteria