E3SM-Project / zppy

E3SM post-processing toolchain
BSD 3-Clause "New" or "Revised" License
6 stars 15 forks source link

[Bug]: `ValueError: x and y must have same first dimension` edge case in `global time_series` #571

Open forsyth2 opened 7 months ago

forsyth2 commented 7 months ago

What happened?

See https://github.com/E3SM-Project/zppy/pull/400#issuecomment-2048589364 -- there appears to be an edge case where dimensions differ, causing global_time_series plots to not display.

What machine were you running on?

Chrysalis

Environment

zppy dev environment, while working on #400

What command did you run?

zppy -c tests/integration/post.v3.LR.piControl.cfg

Copy your cfg file

[default]
input = /lcrc/group/e3sm2/ac.golaz/E3SMv3/v3.LR.piControl
output = /lcrc/group/e3sm/ac.forsyth2/zppy_test_v3_output/v3/v3.LR.piControl
case = v3.LR.piControl
www = /lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_test_v3_www/v3
partition = compute
environment_commands = "source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh"
campaign = "water_cycle"

[ts]
active = True
environment_commands = "source /home/ac.forsyth2/miniconda3/etc/profile.d/conda.sh; conda activate zppy_dev_with_nco_20240405"
years = "0001:0050:10",
walltime = "00:50:00"

  [[ atm_monthly_glb ]]
  input_subdir = "archive/atm/hist"
  input_files = "eam.h0"
  frequency = "monthly"
  mapping_file = "glb"

  [[ lnd_monthly_glb ]]
  input_subdir = "archive/lnd/hist"
  input_files = "elm.h0"
  frequency = "monthly"
  mapping_file = "glb"
  vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"

[global_time_series]
active = True
experiment_name = "v3.LR.piControl"
figstr = "v3.LR.piControl"
atmosphere_only = True
#plots_original = "net_toa_flux_restom,global_surface_air_temperature,toa_radiation,net_atm_energy_imbalance,change_ohc,max_moc,change_sea_level,net_atm_water_imbalance"
plots_lnd = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
ts_num_years = 10
walltime = "00:30:00"
years = "1-50",
climo_years ="1-50",
ts_years ="1-50",

What jobs are failing?

No response

What stack trace are you encountering?

ValueError: x and y must have same first dimension, but have shapes (50,) and (51,)
czender commented 7 months ago

Hi @chengzhuzhang I'm on CEST time so I just now read that there's a time axis issue. It would be helpful if you pointed me to the location of the ELM raw input file(s) that have time_bounds = [-0.0208333333333333, 31] and of the regional average timeseries files that NCO generates from that raw input file. I'll verify NCO whether just copies the time bounds and passes it through, or whether NCO creates the problem. Are you also saying that NCO creates a bad global average from that file? Or is the problem that something downstream does not know how to interpret the negative time bounds? Or...?

czender commented 7 months ago

OK, I found the first ELM file in the PI control simulation. It has a negative time_bounds. That is legal. Below I show the bounds and the dates to which it corresponds:

(e3sm_unified_1.9.3_login) ac.zender@chrlogin1:/lcrc/group/e3sm2/ac.golaz/E3SMv3/v3.LR.piControl/archive/lnd/hist$ ncks -v time v3.LR.piControl.elm.h0.0001-01.nc | m
netcdf  v3.LR.piControl.elm.h0.0001-01 {
 dimensions:
  hist_interval = 2 ;
  time = UNLIMITED ; // (1 currently)

 variables:
  float time(time) ;
   time:long_name = "time" ;
   time:units = "days since 0001-01-01 00:00:00" ;
   time:calendar = "noleap" ;
   time:bounds = "time_bounds" ;

  double time_bounds(time,hist_interval) ;
   time_bounds:long_name = "history time interval endpoints" ;

 data:
  time = 31 ;

  time_bounds = 
  -0.0208333333333333, 31 ;

} // group /
(e3sm_unified_1.9.3_login) ac.zender@chrlogin1:/lcrc/group/e3sm2/ac.golaz/E3SMv3/v3.LR.piControl/archive/lnd/hist$ ncks -v time --cal v3.LR.piControl.elm.h0.0001-01.nc 
netcdf v3.LR.piControl.elm.h0.0001-01 {
  dimensions:
    hist_interval = 2 ;
    time = UNLIMITED ; // (1 currently)

  variables:
    float time(time) ;
      time:long_name = "time" ;
      time:units = "days since 0001-01-01 00:00:00" ;
      time:calendar = "noleap" ;
      time:bounds = "time_bounds" ;

    double time_bounds(time,hist_interval) ;
      time_bounds:long_name = "history time interval endpoints" ;

  data:
    time = "0001-01-31 23:32:16" ;

    time_bounds = 
    "0000-12-31 23:30:00", "0001-02-01" ;

} // group /

So this appears to be a corner case that is due to the model. There is no reason I can think of why a monthly mean file should be written with an initial time corresponding to 11:30 PM of the last day of the previous month. Is there something that you want ncclimo to do when it creates timeseries of files that include such behavior? LMK. One option is to continue to treat this file as a valid January monthly mean and ignore the 30 minutes that apparently belong in the previous month. Before treating it some other way, land model people should first confirm that the 30 minutes are "real", and not some typo/bug in the model that wrote the file. Pinging @thorntonpe for comment.

BunnyVon commented 6 months ago

I ran into the same issue when testing zppy on BGCv2 coupled simulation. @thorntonpe , could you comment on this issue? The script I'm testing will be passed along to Xiaoying for V3 evaluations.

BunnyVon commented 6 months ago

@czender @forsyth2 @chengzhuzhang , I'm a little bit puzzled by this error. In my case, global_time_series was able to PLOT atm variables but NOT land variables. According to my investigation, both atm and lnd ts have the same length in terms of time dimension but with some minor differences such as the name of attributes and the variables time_bounds time_bnds. For example,

netcdf TSA_185001_201412 {
dimensions:
        rgn = 3 ;
        time = UNLIMITED ; // (1980 currently)
        lat = 360 ;
        lon = 720 ;
        rgn_len = 19 ;
        hist_interval = 2 ;
variables:
        char region_name(rgn, rgn_len) ;
                region_name:long_name = "TSA timeseries array contains area-weighted averages over these regions" ;
        float TSA(time, rgn) ;
                TSA:cell_methods = "time: mean area: mean" ;
                TSA:coordinates = "region_name" ;
                TSA:long_name = "2m air temperature" ;
                TSA:missing_value = 1.e+36f ;
                TSA:units = "K" ;
netcdf FLNS_185001_201412 {
dimensions:
        rgn = 3 ;
        time = UNLIMITED ; // (1980 currently)
        ncol = 21600 ;
        rgn_len = 19 ;
        nbnd = 2 ;
variables:
        char region_name(rgn, rgn_len) ;
                region_name:long_name = "FLNS timeseries array contains area-weighted averages over these regions" ;
        float FLNS(time, rgn) ;
                FLNS:Sampling_Sequence = "rad_lwsw" ;
                FLNS:cell_methods = "time: mean area: mean" ;
                FLNS:coordinates = "region_name" ;
                FLNS:long_name = "Net longwave flux at surface" ;
                FLNS:missing_value = 1.e+20f ;
                FLNS:units = "W/m2" ;

Since global_time_series produced atm plots, my initial thoughts is to manipulate those minor differences in land variables to match atm variables. However, the same x, y errors appear....

lnd ts files were generated and placed in /lcrc/group/e3sm2/ac.sfeng1/E3SM_Simulations/20240315.v2.LR.BGC-LNDATM.CONTRL.ne30pg2_r05_EC30to60E2r2.chrysalis/zppy/post/lnd/glb/ts/monthly/165yr. atm ts files are in /lcrc/group/e3sm2/ac.sfeng1/E3SM_Simulations/20240315.v2.LR.BGC-LNDATM.CONTRL.ne30pg2_r05_EC30to60E2r2.chrysalis/zppy/post/atm/glb/ts/monthly/165yr the test script is /lcrc/group/e3sm2/ac.sfeng1/E3SM_Simulations/20240315.v2.LR.BGC-LNDATM.CONTRL.ne30pg2_r05_EC30to60E2r2.chrysalis/zppy/post/scripts/global_time_series_1850-2014.bash

czender commented 6 months ago

The attribute names do not make a difference. My impression is that the plotting routine either does not like the -'ve value of the first timestep in ELM time_bounds variable, or it does not like the fact that it translates to 11:30 PM the evening before the first day of the month. Not sure which.