Open olyson opened 7 months ago
Hi Keith -- thanks for adding this, I still owe you an email about these, too.
Two quick thoughts - one, for the crash, can you try setting PIO_NETCDF_FORMAT=64bit_data? That's needed for some of these large atmosphere input files. It enables any given variable to be >2GB of size.
Another possibility if we really want to see if it requires a lat/lon grid is to generate an mpasa120 one, for the 1-degree case, which shouldn't need any changes to PIO options.
FYI, I actually just gave this a shot and hit problems too.. I'll take a deeper look over the next day or two. We've had a few small issues with PIO going to ultra-high resolution files, so that could play a part too, and since I'm now getting a crash there, I want to play with it a bit. Unfortunately, Jim is the expert and out for a bit!
I'll keep you updated if I learn anything though. I'll also check if we have other streams on the MPAS grid.
@olyson @briandobbins - in looking at the data file /glade/work/oleson/THESISUrbanPropertiesTool_master/urban_properties_180622_release/src/CTSM52_tbuildmax_OlesonFeddema_2020_mpasa3p75_simyr1849-2106_c240430.nc
I see that lat and lon have the same size. This doesn't make sense. If you have a non lat-lon grid - you would want nlon =41943042 and nlat = 1.
dimensions: elementCount = 41943042 ; lat = 41943042 ; lon = 41943042 ; nv = 2 ; time = UNLIMITED ; // (258 currently) variables: double lat(elementCount) ; lat:units = "degrees" ; double lon(elementCount) ; lon:units = "degrees" ;
However, a better way to do this is to have only one gridcell dimension in the file like for the surface dataset on unstructured meshes:
$ ncdump -h surfdata_ne30np4.pg3_hist_2000_78pfts_c240216.nc netcdf surfdata_ne30np4.pg3_hist_2000_78pfts_c240216 { dimensions: gridcell = 48600 ; nlevsoi = 10 ; nlevurb = 10 ; numurbl = 3 ; numrad = 2 ; nglcec = 10 ; nglcecp1 = 11 ; time = UNLIMITED ; // (12 currently) nchar = 256 ; lsmpft = 79 ; natpft = 15 ; cft = 64 ; variables: double LONGXY(gridcell) ; LONGXY:long_name = "longitude" ; LONGXY:units = "degrees east" ; double LATIXY(gridcell) ; LATIXY:long_name = "latitude" ; LATIXY:units = "degrees north" ;
@olsyon - I'm happy to chat about this if it would help.
@mvertens Great catch, thanks for that -- that makes a ton of sense.
I'm going to try to work on this shortly via the ESMF_Regrid tool, to help give Bob an example of how it fails at larger core counts, but I suspect that'll enable me to also make a regridded one (from the 1-degree source file) on a small core count, and may enable testing on the mpasa3p75 grid too.
@olyson I'll keep you in the loop with my approach, and I imagine yours may be more accurate if you have better source data, but the dimensions issue is due to the script being designed for lat/lon coordinates?
Thanks @mvertens and @briandobbins , I think that makes sense. I'll try to generate a file that has the following structure if you think that is correct:
dimensions:
gridcell = 41943042 ;
nv = 2 ;
time = UNLIMITED ; // (258 currently)
variables:
double lat(gridcell) ;
lat:long_name = "latitude" ;
lat:units = "degrees north" ;
double lon(gridcell) ;
lon:long_name = "longitude" ;
lon:units = "degrees east" ;
int LANDMASK(gridcell) ;
double time(time) ;
time:long_name = "time" ;
time:calendar = "noleap" ;
time:units = "days since 0000-01-01 00:00" ;
double area(gridcell) ;
area:units = "radians^2" ;
area:long_name = "area weights" ;
int year(time) ;
year:long_name = "year" ;
year:units = "Year AD" ;
double time_bnds(time, nv) ;
time_bnds:units = "days since 1849-01-01" ;
time_bnds:long_name = "Time" ;
time_bnds:calendar = "noleap" ;
time_bnds:_FillValue = 9.96920996838687e+36 ;
double tbuildmax_TBD(time, gridcell) ;
tbuildmax_TBD:units = "K" ;
tbuildmax_TBD:long_name = "maximum interior building temperature for TBD class" ;
double tbuildmax_HD(time, gridcell) ;
tbuildmax_HD:units = "K" ;
tbuildmax_HD:long_name = "maximum interior building temperature for HD class" ;
double tbuildmax_MD(time, gridcell) ;
tbuildmax_MD:units = "K" ;
tbuildmax_MD:long_name = "maximum interior building temperature for MD class" ;
@olyson - that looks right.
New files:
stream_fldfilename_urbantv = '/glade/work/oleson/THESISUrbanPropertiesTool_master/urban_properties_180622_release/src/CTSM52_tbuildmax_OlesonFeddema_2020_mpasa3p75_simyr1849-2106_c240503.nc' stream_meshfile_urbantv = '/glade/work/oleson/THESISUrbanPropertiesTool_master/urban_properties_180622_release/src/CTSM52_tbuildmax_OlesonFeddema_2020_mpasa3p75_ESMFmesh_c240503.nc'
Tested in /glade/work/oleson/ctsm5.2.0/cime/scripts/clm60_ctsm520_1deg_GSWP3V1_mpasa3p75tbuildfile_2000. But I get the same error and traceback as before, evident in cesm log file:
/glade/derecho/scratch/oleson/clm60_ctsm520_1deg_GSWP3V1_mpasa3p75tbuildfile_2000/run/cesm.log.4353308.desched1.240503-143314
dec0500.hsn.de.hpc.ucar.edu 129: PIO2 pio_file.c retry NETCDF
dec0500.hsn.de.hpc.ucar.edu 129: PIO2 pio_file.c retry NETCDF
dec0500.hsn.de.hpc.ucar.edu 129: PIO2 pio_file.c retry NETCDF
dec0711.hsn.de.hpc.ucar.edu 513: forrtl: severe (174): SIGSEGV, segmentation fault occurred
dec0711.hsn.de.hpc.ucar.edu 513: Image PC Routine Line Source
dec0711.hsn.de.hpc.ucar.edu 513: libpthread-2.31.s 00001539226EA8C0 Unknown Unknown Unknown
dec0711.hsn.de.hpc.ucar.edu 513: libpioc.so 000015392C82C1E4 subset_rearrange_ Unknown Unknown
dec0711.hsn.de.hpc.ucar.edu 513: libpioc.so 000015392C820F69 PIOc_InitDecomp Unknown Unknown
dec0711.hsn.de.hpc.ucar.edu 513: libpiof.so 000015392CA67E9E piolib_mod_mp_pio Unknown Unknown
dec0711.hsn.de.hpc.ucar.edu 513: libpiof.so 000015392CA68B07 piolib_mod_mp_pio Unknown Unknown
dec0711.hsn.de.hpc.ucar.edu 513: libpiof.so 000015392CA6791A piolib_mod_mp_pio Unknown Unknown
dec0711.hsn.de.hpc.ucar.edu 513: cesm.exe 0000000000FCAD9E dshr_strdata_mod_ 1990 dshr_strdata_mod.F90
dec0711.hsn.de.hpc.ucar.edu 513: cesm.exe 0000000000FC5BC1 dshr_strdata_mod_ 1523 dshr_strdata_mod.F90
dec0711.hsn.de.hpc.ucar.edu 513: cesm.exe 0000000000FC2193 dshr_strdata_mod_ 1382 dshr_strdata_mod.F90
dec0711.hsn.de.hpc.ucar.edu 513: cesm.exe 0000000000FBC529 dshr_strdata_mod_ 945 dshr_strdata_mod.F90
dec0711.hsn.de.hpc.ucar.edu 513: cesm.exe 0000000000D471B1 urbantimevartype_ 231 UrbanTimeVarType.F90
dec0711.hsn.de.hpc.ucar.edu 513: cesm.exe 0000000000D48A99 urbantimevartype_ 68 UrbanTimeVarType.F90
dec0711.hsn.de.hpc.ucar.edu 513: cesm.exe 00000000005F6E70 clm_instmod_mp_cl 265 clm_instMod.F90
dec0711.hsn.de.hpc.ucar.edu 513: cesm.exe 00000000005F0796 clm_initializemod 408 clm_initializeMod.F90
dec0711.hsn.de.hpc.ucar.edu 513: cesm.exe 0000000000594F63 lnd_comp_nuopc_mp 659 lnd_comp_nuopc.F90
PIO_NETCDF_FORMAT is set to 64bit_data.
@olyson - thanks for trying this. So what is odd is that if you look at the output in the lnd.log file:
(shr_strdata_readstrm) opening : /glade/work/oleson/THESISUrbanPropertiesTool_master/urban_properties_180622_release/src/CTSM52_tbuildmax_OlesonFeddema_2020_mpasa3p75_simyr1849-2106_c240503.nc
(shr_strdata_readstrm) setting pio descriptor : /glade/work/oleson/THESISUrbanPropertiesTool_master/urban_properties_180622_release/src/CTSM52_tbuildmax_OlesonFeddema_2020_mpasa3p75_simyr1849-2106_c240503.nc
(shr_strdata_set_stream_iodesc) setting iodesc for : tbuildmax_TBD with dimlens(1), dimlens2 = 41943042 258 variable has no time dimension
And you look at the file it self using ncdump -h
/glade/work/oleson/THESISUrbanPropertiesTool_master/urban_properties_180622_release/src/CTSM52_tbuildmax_OlesonFeddema_2020_mpasa3p75_simyr1849-2106_c240503.nc
You will see that double tbuildmax_TBD(time, gridcell) ; tbuildmax_TBD:long_name = "maximum interior building temperature for TBD class" ; tbuildmax_TBD:units = "K" ;
Does indded have a time dimension. So it looks like the iodesc is trying to actually decompose the time dimension as a spatial dimension - which is just wrong. So this is a problem with cdeps itself. I'll try to duplicate this and see if there is an easy fix in cdeps. I'll try to do that over the next few days if I get a chance.
Right, I agree, it's triggering that off of an ndims=2 check in ./components/cdeps/streams/dshr_strdata_mod.F90. Thanks for looking into this.
@olyson @briandobbins - I have a fix for this. For CDEPS in your Externals.cfg - please point to
repo: https://github.com/mvertens/CDEPS.git
branch: feature/fix_mpas_input
I've showed that I can successfully run the experiment that was crashing for @olyson. On derecho see: /glade/u/home/mvertens/run/clm60_ctsm520_1deg_GSWP3V1_mpasa3p75tbuildfile_2000/run
If you can duplicate this with no problem - I'll do a PR to CDEPS. @briandobbins - maybe you or @billsacks can then approve it.
Yes, that works for me as well, great! I ran 1 month.
That's amazing, thank you, both of you, for this! I'll try it in the high-res land and land+atm cases as soon as I can get some nodes.
Thanks @mvertens!
I'd like to move the 0.25x0.25 and mpasa3p75 streams and mesh files I created out of my work directory and onto campaign store, but don't want to interrupt any ongoing testing. @briandobbins can you let me know when that would be ok, thanks.
@olyson Feel free to do it whenever you like - I made copies of them that I'll work with for now, and with Derecho busy, and down for maintenance tomorrow and Wednesday, I don't want to hold you up. Thanks so much, both of you, for your help here!
A higher resolution urban building temperature streams file is needed for testing high-resolution applications of CESM (the current file is 1 degree). The application is a mpasa3p75 grid so it was requested that a file at that native resolution be created.
I made mpasa3p75 urbantv stream and mesh files:
/glade/work/oleson/THESISUrbanPropertiesTool_master/urban_properties_180622_release/src/CTSM52_tbuildmax_OlesonFeddema_2020_mpasa3p75_simyr1849-2106_c240430.nc
/glade/work/oleson/THESISUrbanPropertiesTool_master/urban_properties_180622_release/src/CTSM52_tbuildmax_OlesonFeddema_2020_mpasa3p75_ESMFmesh_c240430.nc
Implemented by setting the following in user_nl_clm:
stream_fldfilename_urbantv = '/glade/work/oleson/THESISUrbanPropertiesTool_master/urban_properties_180622_release/src/CTSM52_tbuildmax_OlesonFeddema_2020_mpasa3p75_simyr1849-2106_c240430.nc' stream_meshfile_urbantv = '/glade/work/oleson/THESISUrbanPropertiesTool_master/urban_properties_180622_release/src/CTSM52_tbuildmax_OlesonFeddema_2020_mpasa3p75_ESMFmesh_c240430.nc'
I tried this in a 1deg I2000 case (/glade/work/oleson/ctsm5.2.0/cime/scripts/clm60_ctsm520_1deg_GSWP3V1_mpasa3p75tbuildfile_2000) and got this information in the land log:
(shr_strdata_readstrm) opening : /glade/work/oleson/THESISUrbanPropertiesTool_master/urban_properties_180622_release/src/CTSM52_tbuildmax_OlesonFeddema_2020_mpasa3p75_simyr1849-2106_c240430.nc (shr_strdata_readstrm) setting pio descriptor : /glade/work/oleson/THESISUrbanPropertiesTool_master/urban_properties_180622_release/src/CTSM52_tbuildmax_OlesonFeddema_2020_mpasa3p75_simyr1849-2106_c240430.nc (shr_strdata_set_stream_iodesc) setting iodesc for : tbuildmax_TBD with dimlens(1), dimlens2 = 41943042 258 variable has no time dimension
and this traceback:
dec1915.hsn.de.hpc.ucar.edu 513: cesm.exe 0000000000FCAD9E dshr_strdatamod 1990 dshr_strdata_mod.F90 dec1915.hsn.de.hpc.ucar.edu 513: cesm.exe 0000000000FC5BC1 dshr_strdatamod 1523 dshr_strdata_mod.F90 dec1915.hsn.de.hpc.ucar.edu 513: cesm.exe 0000000000FC2193 dshr_strdatamod 1382 dshr_strdata_mod.F90 dec1915.hsn.de.hpc.ucar.edu 513: cesm.exe 0000000000FBC529 dshr_strdatamod 945 dshr_strdatamod.F90 dec1915.hsn.de.hpc.ucar.edu 513: cesm.exe 0000000000D471B1 urbantimevartype 231 UrbanTimeVarType.F90
The section of code in the traceback is:
which makes me think that it expects a variable that is dimensioned by lat/lon, not a single dimension, as is the case with this grid. For example, the surface dataset fields at mpasa3p75 are dimensioned by gridcell, not lat/lon. So, I'm not sure if the streams code will handle this 1d grid. Or maybe I need to find a way to make the variables 2d even though the grid is unstructured.