NorESMhub / CAM

Community Atmosphere Model including CAM6-Nor branches
1 stars 20 forks source link

CAM-diagnostics interpretes regridded SE output files as original SE grid. #133

Open oyvindseland opened 4 months ago

oyvindseland commented 4 months ago

Issue Type

Other (please describe below)

Issue Description

I have tried to run cam-diagnostics on the simulation found at Betzy: /cluster/work/users/mvertens/archive/NB1850proto01 The simulations are with SE dycore with output regridded to FV 0.9x1.25 degree grid

The diagnostics simulation fails with the error message:(0) unstructured_to_ESMF: latitude and longitude must have the same number of elements: /cluster/work/users/oyvinds/diagnostics/out/CAM_DIAG/config/NB1850proto01/logs/out_240208_153621.log Prior to the fail, the averaged files are given an SE name e.g /cluster/work/users/oyvinds/diagnostics/out/CAM_DIAG/climo/NB1850proto01/sav_se/NB1850proto01_01_000201_001101_climo_SE.nc Another possible point of failure is that the variable name used for latitude weights are w, not gw as used to be the standard in FV.

Possible test: A SE simulation of 14 months or more to see if the diagnostics tool can manage SE grid output.

Will this change answers?

No

Will you be implementing this yourself?

No

mvertens commented 4 months ago

@oyvindseland - can you summarize the exact version of the diagnostic package you were using. I plan to contact the NCAR folks to see how they handle this.

oyvindseland commented 4 months ago

I did not create the set-up but as far as I can see it is Script Version: 140804 I checked the svn site and it looks like the most recent svn release. https://svn-ccsm-release.cgd.ucar.edu/model_diagnostics/atm/cam/ revision 231

oyvindseland commented 4 months ago

Did you run a simulation with original SE output as well? @mvertens

oyvindseland commented 4 months ago

Checked the version on Nird and it is the same as on Betzy.

mvertens commented 4 months ago

@oyvindseland - I have not run a simulation with just SE output yet. We are still moving and everything is totally chaotic today. I'll start one tomorrow.

oyvindseland commented 4 months ago

No worries, I do not sit around waiting for it.

mvertens commented 4 months ago

@gold2718 - could you please help with this as well?

oyvindseland commented 4 months ago

Information about diagnostics can be found at https://noresm-docs.readthedocs.io/en/noresm2/diagnostics/diagnostics.html

On betzy the command is /cluster/shared/noresm/diagnostics/noresm/bin/diag_srun

oyvindseland commented 4 months ago

Default amwg script at /cluster/shared/noresm/diagnostics/noresm/packages/CAM_DIAG Before it actually runs the scripts it is by default copied to /cluster/work/users/$user/diagnostics/out/CAM_DIAG/config/$CASENAME/run_scripts Path can be changed by the script and it can also create the scripts without running.

mvertens commented 4 months ago

@oyvindseland - @gold2718 has forked the repository and I have downloaded it to /cluster/shared/noresm/diagnostics/noresm_dev on betzy. I would like first to reproduce your error. What was your command to diag_srun that resulted in this failure?

oyvindseland commented 4 months ago

Command that failed

/cluster/shared/noresm/diagnostics/noresm/bin/diag_srun -m cam -i /cluster/work/users/mvertens/archive -c NB1850proto01 -s 2 -e 11

mvertens commented 4 months ago

So I changed all the variable name from w -> gw in all the cam history files. Now It is dying with the following error: nco_err_exit(): ERROR Short NCO-generated message (usually name of function that triggered error): nco_get_var1() nco_err_exit(): ERROR Short NCO-generated message (usually name of function that triggered error): nco_get_var1() nco_err_exit(): ERROR Error code is 12. nco_err_exit(): ERROR Error code is 12. Translation into English with nc_strerror(12) is "Cannot allocate memory" Translation into English with nc_strerror(12) is "Cannot allocate memory" ERROR: nco_get_var1() failed to nc_get_var1() variable "time_bnds" nco_err_exit(): ERROR NCO will now exit with system call exit(EXIT_FAILURE) ERROR: nco_get_var1() failed to nc_get_var1() variable "time_bnds" nco_err_exit(): ERROR NCO will now exit with system call exit(EXIT_FAILURE) I believe that the version we are using of the CAM diagnostic package is no longer compatible with the CAM history output for the development code.

mvertens commented 4 months ago

So I scrubbed everything and tried again - and got totally different errors. See /cluster/work/users/mvertens/diagnostics/logs/-diagsrun-240213-194000.log. @oyvindseland - can you try running the script again and see if you get anything different.

oyvindseland commented 4 months ago

I reran the script and also got an OOM error. I do not think I have seen an out of memory issue in the diagnostics before so I do not understand why this is. Just need to ask for more memory in the script? I should add though that I rarely use the script on Betzy but on Nird.

I did copy year 2 and 3 of your output files, renamed gw and ran the amwg script without the wrapper.

In this case the script runs but have only relatively limited output. The output claims that the variable hyam is missing fatal:["Execute.c":6394]:variable (hyam) is not in file (inptr) Also did the same for your set-up and got the same result, some plots and the "hyam error"

Plots: https://ns2345k.web.sigma2.no/diagnostics/noresm/oyvinds/NB1850proto01/ For comparison 20 years of CMIP6 piControl: https://ns2345k.web.sigma2.no/diagnostics/noresm/oyvinds/N1850frc2_f09_tn14_20191001/CAM_DIAG/

mvertens commented 4 months ago

@oyvindseland - I think the problem is that on betzy the wrapper is submitted to the preproc queue which is a shared memory batch node. So depending on who else is using it will limit the memory available. This explains I think why the OOM appeared in different places each time the wrapper was submitted on betzy. When you just run the script itself interactively you are using the shared memory of the login node. I think running on Nird is probably better. BTW - I changed the variable from w -> gw in all of the files. The fact that the variable is denoted as missing which is not on the input file is problematic. @gold2718 - where are the latest version(s) of the CAM diagnostic packages. Is anything available on github at this point?

oyvindseland commented 4 months ago

On nird the script runs without OOM but the hyam problem is still the same.

oyvindseland commented 4 months ago

A test with native grid output created the same plots as the coupled simulation. The definition of vertical levels, hyam and hybm are still missing from the averaged files. https://ns2345k.web.sigma2.no/diagnostics/noresm/oyvinds/NF2000proto01/

The interpolation of SE onto a lat-lon grid in the diagnostics fails, see e.g. https://ns2345k.web.sigma2.no/diagnostics/noresm/oyvinds/NB1850proto01/yrs2to3-obs/set5_6/set5_ANN_LWCF_ERBE_obsc.png vs https://ns2345k.web.sigma2.no/diagnostics/noresm/oyvinds/NF2000proto01/yrs1to1-obs/set5_6/set5_ANN_LWCF_ERBE_obsc.png

oyvindseland commented 4 months ago

I looked around at the amwg website and I found some diagnostics plot with SE and 48 Levels so it should be possible if we need to use the ncl diagnostics The simulations were relatively old (2021) https://webext.cgd.ucar.edu/FWscHIST/f.e21.FWscHIST_BGC.ne30_ne30_mg17_L48_revert-J.001/atm/

The table that linked in the simulations did not say who created the plots or did the simulations https://docs.google.com/spreadsheets/d/1nSTQ9tscsqeLhy3fhytW_ko1wLjydYqa5ZRGThLP2K8/edit#gid=1338712341