Closed dariak-bsc closed 5 months ago
Hi @dariak-bsc , hi @etiennesky,
thanks for submitting these results of your checker. Looks like we fixed most of the problems of round 1 then 🥳 🥳 (that is good news).
The remaining ones should be straightforward to address as well. There is a bit that needs discussion.
According to the CF Standard missing_value
has been deprecated already before the first version of the CF Conventions 1.0, ie they were deprecated before 2003. I think more than 20 years after its deprecation, it is fine not to have that attribute.
Oh, sure,
sector:id
or ids
is a mistake, that i will fix for the next roundid
and ids
and whether i can find something in the standard about those attributes and report back.~ They were consistently named ids
in the input4mips files. The standard does not know any such case, except section 6.1 Labels where they suggest to use a string-valued coordinate (see comment below).But thanks again for these submissions and the wonderful news!
I am also now watching the full repository, so you can expect that i'll respond quicker to your follow-up comments! (except that i am on vacation the coming two weeks 🤦 )
Some more feedback for the sector
coordinate of the CO2_em_anthro
variable:
Energy
sector that is clearly a mistake that i'll fixAgriculture CO2
emissions were 0 in input4mips as well, i'll add them as suchNegative emissions
sector for CO2 is superseeded by the new CO2_em_removal
variable where those are split into sectors CDR DACCS
, CDR OAE
and CDR Industry
This means that all *_em_anthro
files will have consistently the length 7 and the sectors: Agriculture
, Energy
, Industrial
, Transportation
, Residential, Commercial, Other
, Solvents production and application
, Waste
, International Shipping
.
sector:ids
turns out to be consistently named in the input4mips files, but the CF conventions standard does not contain any such coordinates, instead in section 6.1 Labels it proposes the use of string-valued coordinates (which Matt and I would also prefer strongly):
This would mean that the sector
coordinate which has now the integer values 0 to 7 would be replaced everywhere by a coordinate with the labels
sector = "Agriculture", "Energy", "Industrial", "Transportation", "Residential, Commercial, Other", "Solvents production and application", "Waste", "International Shipping"
.
The effect from within xarray for example would be that you can plot transportation emissions in "July 2050" by: ds.sel(sector="Transportation", time="2050-07").plot()
@etiennesky @dariak-bsc What do you think?
Re 1.
According to the CF Standard
missing_value
has been deprecated already before the first version of the CF Conventions 1.0, ie they were deprecated before 2003. I think more than 20 years after its deprecation, it is fine not to have that attribute.
yes, you are right _FillValue is the only required metadata and missing_value is deprecated. The CEDS data contain both, but it's fine to only have _FillValue
sector:ids
turns out to be consistently named in the input4mips files, but the CF conventions standard does not contain any such coordinates, instead in section 6.1 Labels it proposes the use of string-valued coordinates (which Matt and I would also prefer strongly):This would mean that the
sector
coordinate which has now the integer values 0 to 7 would be replaced everywhere by a coordinate with the labelssector = "Agriculture", "Energy", "Industrial", "Transportation", "Residential, Commercial, Other", "Solvents production and application", "Waste", "International Shipping"
.The effect from within xarray for example would be that you can plot transportation emissions in "July 2050" by:
ds.sel(sector="Transportation", time="2050-07").plot()
@etiennesky @dariak-bsc What do you think?
IMHO the ESMs are written in Fortran and have been programmed to use the hard-coded ids as integers, it would be best to keep things compatible.
Some more feedback for the
sector
coordinate of theCO2_em_anthro
variable:
- The
Negative emissions
sector for CO2 is superseeded by the newCO2_em_removal
variable where those are split into sectorsCDR DACCS
,CDR OAE
andCDR Industry
I think it was our intention since the beginning to provide both the total negative emissions as a single sector in the main CO2-em-anthro files. And then this total value would be split up among several new sectors in the new CO2_em_removal.
In summary, I would like to have bothso users can choose.
IMHO the ESMs are written in Fortran and have been programmed to use the hard-coded ids as integers, it would be best to keep things compatible.
For some context, while it is easier in python to access a string-indexed dictionnary, it is harder to do in Fortran.
IMHO the ESMs are written in Fortran and have been programmed to use the hard-coded ids as integers, it would be best to keep things compatible
Ok, i reviewed the netcdf C library, which i hope is very close in use to the fortran libs. Basically, coordinates and variables have an integer number that one inquires with nc_inq_varid
and then you read into them with nc_get_vara_double/float
by specifying a start and count integer array where you want to start and how much you want to read along any dimension.
If that is all they use, this would mean:
sector, time, lat, lon
for CO2 and time, sector, lat, lon
everywhere else. I did go for the latter everywhere (consistency).float
s for the main variable (we currently have double
)sector
variable associated with the sector dimension and assume a hard-coded order instead, we can actually use string coordinates but have to make sure that they match the old order exactly). This scenario is quite likely since the datatype of the sector variable for CO2 was double and for the other gases int. BUT, note that i am also fine with sticking to the old solution.Sure, i understand that. So as a summary, we need the anthro files to have always the full "sector" dimension with:
0: "Agriculture", 1: "Energy", 2: "Industrial", 3: "Transportation", 4: "Residential, Commercial, Other", 5: "Solvents production and application", 6: "Waste", 7: "International Shipping" 8: "Negative emissions (for CO2 only)"
We need to change the datatype to float.
Some more feedback for the
sector
coordinate of theCO2_em_anthro
variable:
- The
Negative emissions
sector for CO2 is superseeded by the newCO2_em_removal
variable where those are split into sectorsCDR DACCS
,CDR OAE
andCDR Industry
I think it was our intention since the beginning to provide both the total negative emissions as a single sector in the main CO2-em-anthro files. And then this total value would be split up among several new sectors in the new CO2_em_removal.
In summary, I would like to have bothso users can choose.
Ok, i was not aware that we wanted to have the negative emissions two times, but i'll make sure that this works.
Ok, i was not aware that we wanted to have the negative emissions two times, but i'll make sure that this works.
Thanks, this way our scenarios are "compatible" with the CMIP6 ones.
Some more feedback for the
sector
coordinate of theCO2_em_anthro
variable:
- The
Negative emissions
sector for CO2 is superseeded by the newCO2_em_removal
variable where those are split into sectorsCDR DACCS
,CDR OAE
andCDR Industry
I think it was our intention since the beginning to provide both the total negative emissions as a single sector in the main CO2-em-anthro files. And then this total value would be split up among several new sectors in the new CO2_em_removal.
In summary, I would like to have bothso users can choose.
Hi @etiennesky here I think I disagree and it would be useful to discuss more.
I had understood that we wanted to treat negative emissions this time explicitly different. I would rather prefer to provide a single file (anthro incl CDR) than include two files, both including negative emissions.
My primary concern is the risk that data users could accidently double count the negative emissions. One of the nice safeguards of the CMIP6 data is that any given emission flux was only provided once. You can stack them together, sum over different dimensions, and all data is consistent. I would strongly advise against breaking that pattern.
My primary concern is the risk that data users could accidently double count the negative emissions. One of the nice safeguards of the CMIP6 data is that any given emission flux was only provided once. You can stack them together, sum over different dimensions, and all data is consistent. I would strongly advise against breaking that pattern.
Hi @gidden I think your reasoning is fine, it will be a slight burden for any modelling group to implement this, but it should be ok as long as it adheres to the same structure as the existing CO2-em-anthro files (with a sector dimension properly documented).
Hi, Some feedback on the CO2-em-removal files:
In the CO2-em-removal files, the sector ids are NaN for the OAE sector:
double sector(sector) ; sector:_FillValue = NaN ; sector:long_name = "sector" ; sector:id = "1.0: CDR Industry; nan: CDR OAE; 0.0: CDR DACCS" ;
Also sector data type is double
in these files compared to int64
in the corresponding CO2-em-anthro files.
EDIT: Moved into new issue #34 by @coroa .
Hi all,
Errors & warnings found with the file checker - preliminary results:
missing_value
is missing, in the reference files we have both_FillValue
andmissing_value
e.g.but only
_FillValue
is present in the checked files.sector
: in the reference files the length of this dimension is either 9 (for CO2) or 8 (for other gas species)and
but in the checked files we have a different length of the dimension
sector
: 6 for CO2 and 5 for other species, and the order of ids is not consistent:BC-em-anthro_input4MIPs_emissions_RESCUE_IIASA-PIK-REMIND-MAgPIE-3.2.0-4.7.0-RESCUE-Tier1-Direct-2023-12-13-EocBudg1150-OAE-off-2023-12-08_gn_202001-210012.nc: dimensions: sector = 5 ; sector:id = "2: Industrial Sector; 4: Residential Commercial Other; 3: Transportation Sector; 6: Waste; 7: International Shipping" ;