IAMconsortium / concordia

Apache License 2.0
0 stars 3 forks source link

I/O issues processing latest CO2 emissions #60

Closed etiennesky closed 2 months ago

etiennesky commented 5 months ago

when processing the latest CO2 emissions I have run into a few issues

  1. the chunking of the CO2_em_AIR_anthro is not in a way that allows to read optimally the data , so it takes much longer to read
cdo -v sinfon CO2-em-AIR-anthro_input4MIPs_emissions_RESCUE_gn_201501-210012.nc
  OpenMP:  num_procs=256  max_threads=1
   File format : NetCDF4 zip
    -1 : Institut Source   T Steptype Levels Num    Points Num Dtype : Parameter name : Extra
     1 : unknown  Scenarios v instant      25   1    259200   1  F32z : CO2_em_AIR_anthro : chunks=180x90x5x30 
     2 : unknown  Scenarios c instant      25   1         2   2  F64  : level_bnds     : 

compare this to one of the input4MIPS files

c3et@ac6-200: /ec/res4/hpcperm/c3et/data-coupler/data/co2/CEDS > cdo -v  sinfon CO2-em-AIR-anthro_input4MIPs_emissions_ScenarioMIP_IAMC-IMAGE-ssp119-1-1_gn_201501-210012.nc 
 OpenMP:  num_procs=256  max_threads=1
   File format : NetCDF4 zip
    -1 : Institut Source   T Steptype Levels Num    Points Num Dtype : Parameter name : Extra
     1 : unknown  IAMC     v instant      25   1    259200   1  F32z : CO2_em_AIR_anthro : chunks=360x180x13x1 

After running this command I was able to compute a global integral and vertical sum in much less time.

ncks -4 -h --cnk_dmn late,180 --cnk_dmn lon,360 --cnk_dmn level,13 --cnk_dmn time,1 CO2-em-AIR-anthro_input4MIPs_emissions_RESCUE_gn_201501-210012.nc.bak CO2-em-AIR-anthro_input4MIPs_emissions_RESCUE_gn_201501-210012.nc

  1. when using cdo to compute global integrals, cdo gets confused because there are 2 grids, I think the extra one is the new ""
cdo -f nc4c -z zip_2 vertsum /hpcperm/c3et/data-coupler/data/co2/CEDS/CO2-em-AIR-anthro_input4MIPs_emissions_RESCUE_gn_201501-210012.nc tmp_em_AIR_1.nc
cdo    vertsum: Processed 777600050 values from 2 variables over 120 timesteps [220.00s 276MB]
+ cdo fldsum -mul tmp_em_1.nc -gridarea tmp_em_1.nc tmp_em_2.nc
cdo(1) mul: Process started
cdo(2) gridarea: Process started
cdi  warning (cdf_set_dimtype): Could not assign all character coordinates to data variable!
cdi  warning (cdf_set_dimtype): Could not assign all character coordinates to data variable!
cdo(2) gridarea: Using default planet radius: 6371000m
cdo(1) mul: Filling up stream2 >(pipe2.4)< by copying the first variable.
cdo    fldsum:  46%
cdo    fldsum: Processed 435456000 values from 1 variable over 120 timesteps [122.33s 184MB]
+ cdo fldsum -mul tmp_em_AIR_1.nc -gridarea tmp_em_AIR_1.nc tmp_em_AIR_2.nc
cdo(1) mul: Process started
cdo(2) gridarea: Process started
cdo(2) gridarea (Warning): Found more than 1 grid, using the first one!
cdo(2) gridarea: Using default planet radius: 6371000m

cdo(1) mul (Abort): Grid size of the input fields do not match!

if I add a cdo -delvar,level_bnds command before the processing, it works without problems.

  1. Time datatypte is int64, whereas that of input4MIPS files is double (which is also the common way of representing time) e.g. in CF standards
double time(time) ;
  time:long_name = "time" ;
  time:units = "days since 1990-1-1 0:0:0" ;

Also, time units are missing the 0:0:0 to be fully standard.

coroa commented 5 months ago

Thanks for the reports, Etienne.

Almost everything should be addressed in #61.

I am uploading a CO2_em_anthro and a CO2_em_AIR_anthro test file into /forcings/emissions/2024-07-01 to test whether that works for you (internet is not the best, could take a while).

1. Chunksizes

Does not matter to us. I set them now to yearly full chunks consistently, ie. 360x180x25x1 in cdo-speak, since that was a simple rule. Please test if that is good enough or whether we need to manually set additional level and sector chunk sizes.

2. Two grids

The level coordinate variable did not have the bounds: "level_bnds" attribute which confused cdo. Fixed.

3. Time coordinate variable

xarray chose int for us automatically, since i guess int was good enough for the times we need to encode, but i forced it to double (even though i did not find anything in CF saying it should be double).

I don't understand what you mean by:

Also, time units are missing the 0:0:0 to be fully standard.

Currently our time:units attribute is set to days since 2015-1-1 0:0:0. Anything there to dislike?

coroa commented 5 months ago

Etienne, please test whether the new files work. Then we'll close the ticket. They finished uploading.

etiennesky commented 5 months ago

thanks @coroa I will test as soon as I get better (running on survival mode right now).

coroa commented 4 months ago

Sorry to hear. Thanks

coroa commented 2 months ago

Fixed by newest version of emissions forcing dataset