Closed mpaiao closed 3 years ago
You might get more information from the crash by setting the pio debug level higher, I think it goes as high as 6, it defaults at 0
./xmlchange PIO_DEBUG_LEVEL=6
Are you generating the same error using datasets that have worked in the past? For instance, the BCI run you mention, did that have any modified surface/domain/parameter/met-driver files?
Have you tried turning off FATES and had any luck with the big-leaf model?
It's my first time trying to run with BCI data, so I don't have a reference for a successful run. The only change I made in the data was to update the paths in bci_inv_file_list.txt, otherwise, I'm using the same as I got from you (bci_0.1x0.1_v4.0).
I ran again the model with PIO_DEBUG_LEVEL=6, it produced a lot more error messages but this seems to be the same problem as in the previous attempt.
0 PIOc_inq_varid ncid = 128 name = ZBOT
0 PIOc_inq_var ncid = 128 varid = 1
0 pio_get_file ncid = 128
0 Calling the netCDF layer
0 nc_inq_varndims called ndims = 2
0 my_name = ZBOT my_xtype = 5 my_ndims = 2 my_natts = 3
0 PIOc_setframe ncid = 128 varid = 1 frame = 743
0 pio_get_file ncid = 128
0 PIOc_read_darray ncid 128 varid 1 ioid 513 arraylen 1
0 pio_get_file ncid = 128
0 pio_read_darray_nc_serial vid = 1
0 fndims 2 ndims 2 vdesc->record 743 vdesc->ndims 2
Abort with message unexpected record in file /Users/marcoslongo/Dropbox/Home/Models/CTSM/cime/src/externals/pio2/src/clib/pio_darray_int.c at line 1446
Obtained 1 stack frames.
0 cesm.exe 0x0000000103ba2a0c print_trace + 34
(The complete log file is here).
I am trying to set up the case with FATES turned off. Sorry for all the basic questions, is this done setting the following xml change?
./xmlchange CLM_BLDNML_OPTS="-bgc cn -no-megan"
I tried -bgc none
as it was just for test, but it said this is not a valid option. I submitted the simulation, but it is currently downloading a lot of very large files.
Hey Marcos,
Have you tried any of the out of the box configurations?
e.g. box 4 in my "help I've forgotten how everything works" cheat sheet ;) https://github.com/rosiealice/fates/wiki/Rosie's-developer-instructions
I suspect that this sort of PIO error would likely show up whatever configuration you use, but it's good to check at least. My hunch is that it's something to do with python versions....
Many thanks for the cheat sheet, Rosie, this is gold! Bookmarked here ;)
I was able to run the vanilla CLM5 (and the ELM equivalent), and also the 2019 Workshop walkthrough examples. I also created and successfully ran all these tests using this generic shell script.
The error is specific to the single-site simulations, so maybe python is fine, and I'm just messing some of the configurations when running a single site.
A few updates:
start_type="startup"
in the walkthrough test, and start_type="continue"
for the single-site.
I left all the differences between namelists in this log file. Also for reference, these are the commits I am using:
commit 1e3cc271df6b452d25f9115856cbb6218215c036 (HEAD -> master, tag: ctsm5.1.dev043, origin/master, origin/HEAD)
commit 1723d1443a2bc84f15f9b4e6e637592b49790971 (HEAD, tag: sci.1.46.0_api.16.0.0, master)
commit 88b6134789fd1e21119a7cbf30d3bed4d13f306f (HEAD, tag: cime5.8.47)
I am having the same problem, but was able to run the versions of ctsm/fates mentioned above when I reverted to an earlier version of cime (tag: cime5.8.32). Perhaps something in the most recent cime code is causing the issue?
@adamhb This is interesting. I tried the same cime version, but it did not work for me. I actually tried several versions between 5.8.16 and the current one, and none of them worked, though the error message varied for different versions. Would you mind sending me the settings you used for the test that ran successfully? I'd like to compare with my settings with yours. Thanks!
@mpaiao No problem. Keep in mind that I'm very new to running FATES myself! I attached my bash script (had to attach as .txt) that I'm using to build the case that worked on Lobata.
Jessie pointed out a couple key things that you might need to change if you're not doing them already:
parteh_mode = 1
in the name list options to run the carbon only model. (see the section of the attached script titled "# MODIFY THE CLM NAMELIST (USERS MODIFY AS NEEDED)"use_fates_ed_prescribed_phys = .false.
in the name list optionscime/ctsm/fates tags: cime5.8.32; ctsm5.1.dev042; sci.1.46.0_api.16.0.0
I am using Python 3.7. Let me know if you need me to send any of the param or driver files I mentioned above, or if you need to know anything else about this run!
Thanks @adamhb. I tried and it still fails here, with the same error.
For reference, I am running the model on my local computer (Mac Big Sur 11.4). I am using gcc 11.1.0_1 (so gcc-11, gfortran-11, g++-11) and Python 3.9.5, and compiling the code with mpi-serial. GNU compilers and python are just the default ones for homebrew. The XML configuration files I am using are here.
@mpaiao . Ok, these details are very new to me, but I've attached the machine and compiler configuration files and a software environment file for the run that might be helpful to you.
One other thought: For Lobata users, we followed [Greg's new user setup instructions] (https://github.com/glemieux/fates-scratch/blob/master/Notes/lobata/NewUserSetup.md) which includes making sure we have access to some specific programs (including some that seem to have NETCDF functionality). You may want to look at those setup instructions to make sure your local machine has all the programs it needs?
config_compilers.txt ahb_software_environment.txt config_machines.txt
Adam
Thanks for the files, Adam! I will compare the configurations and see if I can spot something promising.
@mpaiao and @adamhb . Is this crash occurring during a non-FATES run? If so, this issue might not be getting visibility with the folks who are best equipped to help solve this. We could open a new issue and link this thread from CTSM.
We discussed this at the CTSM software meeting this morning. One point of conversation was the use of PIO2 or PIO1 (old and new version of parallel I/O software stack). For simulations on my linux workstation, I override the default (PIO2) and set PIO1, via:
./xmlchange PIO_VERSION=1
After reverting to PIO 2, my simulations crash as well:
bort with message unexpected record in file /raid1/rgknox/SyncLRC/ctsm/cime/src/externals/pio2/src/clib/pio_darray_int.c at line 1446
Obtained 10 stack frames.
/raid1/rgknox/Models/land_runs/bci-test-radloop-hifrq-v1.C88b613478-F8c8da995.2021-06-24/bld/cesm.exe(+0xc75b82) [0x55cf569bcb82]
/raid1/rgknox/Models/land_runs/bci-test-radloop-hifrq-v1.C88b613478-F8c8da995.2021-06-24/bld/cesm.exe(+0xc75c4d) [0x55cf569bcc4d]
/raid1/rgknox/Models/land_runs/bci-test-radloop-hifrq-v1.C88b613478-F8c8da995.2021-06-24/bld/cesm.exe(+0xc75c7f) [0x55cf569bcc7f]
/raid1/rgknox/Models/land_runs/bci-test-radloop-hifrq-v1.C88b613478-F8c8da995.2021-06-24/bld/cesm.exe(+0xc9f1a5) [0x55cf569e61a5]
/raid1/rgknox/Models/land_runs/bci-test-radloop-hifrq-v1.C88b613478-F8c8da995.2021-06-24/bld/cesm.exe(+0xc9d826) [0x55cf569e4826]
/raid1/rgknox/Models/land_runs/bci-test-radloop-hifrq-v1.C88b613478-F8c8da995.2021-06-24/bld/cesm.exe(+0xc692f6) [0x55cf569b02f6]
/raid1/rgknox/Models/land_runs/bci-test-radloop-hifrq-v1.C88b613478-F8c8da995.2021-06-24/bld/cesm.exe(+0xc6e62a) [0x55cf569b562a]
/raid1/rgknox/Models/land_runs/bci-test-radloop-hifrq-v1.C88b613478-F8c8da995.2021-06-24/bld/cesm.exe(+0xb86ba6) [0x55cf568cdba6]
/raid1/rgknox/Models/land_runs/bci-test-radloop-hifrq-v1.C88b613478-F8c8da995.2021-06-24/bld/cesm.exe(+0xb8edcb) [0x55cf568d5dcb]
/raid1/rgknox/Models/land_runs/bci-test-radloop-hifrq-v1.C88b613478-F8c8da995.2021-06-24/bld/cesm.exe(+0xc05f77) [0x55cf5694cf77]
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
#0 0x7f10052462ed in ???
#1 0x7f1005245503 in ???
#2 0x7f10048c303f in ???
#3 0x7f10048c2fb7 in ???
#4 0x7f10048c4920 in ???
#5 0x55cf569bcc51 in piodie
at /raid1/rgknox/SyncLRC/ctsm/cime/src/externals/pio2/src/clib/pioc_support.c:561
#6 0x55cf569bcc7e in pioassert
at /raid1/rgknox/SyncLRC/ctsm/cime/src/externals/pio2/src/clib/pioc_support.c:582
#7 0x55cf569e61a4 in pio_read_darray_nc_serial
at /raid1/rgknox/SyncLRC/ctsm/cime/src/externals/pio2/src/clib/pio_darray_int.c:1444
#8 0x55cf569e4825 in PIOc_read_darray
at /raid1/rgknox/SyncLRC/ctsm/cime/src/externals/pio2/src/clib/pio_darray.c:939
#9 0x55cf569b02f5 in read_darray_internal_real
at /raid1/rgknox/Models/land_runs/bci-test-radloop-hifrq-v1.C88b613478-F8c8da995.2021-06-24/bld/gnu/mpi-serial/debug/nothreads/mct/pio/pio2/src/flib/piodarray.F90:349
#10 0x55cf569b5629 in __piodarray_MOD_read_darray_1d_real
at /raid1/rgknox/Models/land_runs/bci-test-radloop-hifrq-v1.C88b613478-F8c8da995.2021-06-24/bld/gnu/mpi-serial/debug/nothreads/mct/pio/pio2/src/flib/piodarray.F90:331
#11 0x55cf568cdba5 in shr_dmodel_readstrm
at /raid1/rgknox/SyncLRC/ctsm/cime/src/share/streams/shr_dmodel_mod.F90:965
#12 0x55cf568d5dca in __shr_dmodel_mod_MOD_shr_dmodel_readlbub
at /raid1/rgknox/SyncLRC/ctsm/cime/src/share/streams/shr_dmodel_mod.F90:699
#13 0x55cf5694cf76 in __shr_strdata_mod_MOD_shr_strdata_advance
at /raid1/rgknox/SyncLRC/ctsm/cime/src/share/streams/shr_strdata_mod.F90:891
#14 0x55cf55e44c1c in __datm_comp_mod_MOD_datm_comp_run
at /raid1/rgknox/SyncLRC/ctsm/cime/src/components/data_comps_mct/datm/src/datm_comp_mod.F90:664
#15 0x55cf55e50cf5 in __datm_comp_mod_MOD_datm_comp_init
at /raid1/rgknox/SyncLRC/ctsm/cime/src/components/data_comps_mct/datm/src/datm_comp_mod.F90:568
#16 0x55cf55e4402c in __atm_comp_mct_MOD_atm_init_mct
at /raid1/rgknox/SyncLRC/ctsm/cime/src/components/data_comps_mct/datm/src/atm_comp_mct.F90:172
#17 0x55cf55d813ea in __component_mod_MOD_component_init_cc
at /raid1/rgknox/SyncLRC/ctsm/cime/src/drivers/mct/main/component_mod.F90:248
#18 0x55cf55d70164 in __cime_comp_mod_MOD_cime_init
at /raid1/rgknox/SyncLRC/ctsm/cime/src/drivers/mct/main/cime_comp_mod.F90:1415
#19 0x55cf55d7d8bd in cime_driver
at /raid1/rgknox/SyncLRC/ctsm/cime/src/drivers/mct/main/cime_driver.F90:122
#20 0x55cf55d7dad8 in main
at /raid1/rgknox/SyncLRC/ctsm/cime/src/drivers/mct/main/cime_driver.F90:23
Aborted (core dumped)
Will dig into this some more.
Hi @rgknox, I haven't tried a non-FATES run, but it does seem like a CTSM/CIME issue, because the problem went away for me when I switched to an earlier version of cime (tag: cime5.8.32). For now I'm just using this older version of cime for my simulations. When I went back to the most recent version of cime it crashed again, and the PIO option you mentioned didn't seem to make a difference.
@rgknox It depends on the settings. Below is a summary of my success/failure so far (this was before the PIO_VERSION suggestion). Almost all cases were successfully created, failure only occurred when I submitted the cases. The scripts I used for each case are here.
Script | Description | FATES | RES | COMPSET | Status |
---|---|---|---|---|---|
B0000-CLM | Vanilla CLM | off | f45_f45_mg37 | I2000Clm50BgcCrop | Success |
B0000-ELM | Vanilla ELM | off | f45_f45_mg37 | IELMBGC | Success |
F0001_SimpleCase_CLM | Walkthrough 1 | on | 1x1_brazil | I2000Clm51Fates | Success |
F0001_SimpleCase_ELM | Walkthrough 1 | on | 1x1_brazil | IELMFATES | Success |
B0001_SimpleCase-CLM | Walkthrough 1 | off | 1x1_brazil | I2000Clm51Bgc | Success |
B0001_SimpleCase-ELM | Walkthrough 1 | off | 1x1_brazil | I2000ELMBGC | Success |
F0002_BrazilTest_CLM | Walkthrough 2 | on | 1x1_brazil | I2000Clm51Fates | Success |
F0002_BrazilTest_ELM | Walkthrough 2 | on | 1x1_brazil | IELMFATES | Success |
B0002_BrazilTest-CLM | Walkthrough 2 | off | 1x1_brazil | I2000Clm51Bgc | Success(1) |
B0002_BrazilTest-ELM | Walkthrough 2 | off | 1x1_brazil | I2000ELMBGC | Success(1) |
F0003_BCINoInv-CLM | BCI test (no inventory) | on | CLM_USRDAT | I2000Clm51Fates | Failure(2) |
F0003_BCINoInv-ELM | BCI test (no inventory) | on | ELM_USRDAT | IELMFATES | Failure(2) |
B0003_BCINoInv-CLM | BCI test (no inventory) | off | CLM_USRDAT | I2000Clm51Bgc | Failure(2) |
B0003_BCINoInv-ELM | BCI test (no inventory) | off | ELM_USRDAT | IELMFATES | Failure(3) |
(1) It worked when I excluded the variables by age and size class from hist_fincl1
. I guess this makes sense.
(2) Runs failed with or without the ./xmlchange PIO_VERSION=1
settings.
(3) Compilation failed. I am attaching the log here, it looks related to the crashes.
I identified the issue. Based on the single-point driver for Mexico City (one of the standard cases that uses single-point meteorological data), the variable ZBOT
must be a time series, not a single, time-invariant variable. I re-created the meteorological forcing for my test site and CLM-FATES is now running fine on my local computer. The current drivers for BCI (bci_0.1x0.1_v4.0i) do not seem to be compatible and may need to be updated.
For reference, I updated the code I used to generate initial conditions compatible with the current code:
Note: the current default reference data used for the surface data file is not compatible with ELM (e.g., it doesn't have all the variables needed). I am still looking for a better reference. This has been fixed.
It seems ELM requires a variable T_BUILDING_MAX
in the surface data, but CLM does not. This seems to be used by the urban land unit. For now, I updated the script to generate the missing variable in make_fates_domain+surface.Rmd based on a simple relation with T_BUILDING_MIN
. This is done only when T_BUILDING_MAX
is missing from the reference surface data. The surface data file that this script creates a netCDF file compatible with both CLM and ELM.
Well done! And thanks for letting me know about the solution and sharing your scripts for making met driver, surface, and domain files. Do you now have updated versions of these files for a BCI single site simulation with CTSM? Adam
On Sun, Jun 27, 2021 at 11:00 PM Marcos Longo @.***> wrote:
I identified the issue. Based on the single-point driver for Mexico City (one of the standard cases that uses single-point meteorological data), the variable ZBOT must be a time series, not a single, time-invariant variable. I re-created the meteorological forcing for my test site and CLM-FATES is now running fine on my local computer. The current drivers for BCI (bci_0.1x0.1_v4.0i) do not seem to be compatible and may need to be updated.
For reference, I updated the code I used to generate initial conditions compatible with the current code:
- R (Markdown) script that creates CLM-friendly met drivers: make_fates_met_driver.Rmd https://github.com/mpaiao/FATES_Utils/blob/master/make_single_site/make_fates_met_driver.Rmd
- R (Markdown) script that creates CLM-friendly surface data based on gridded surface data files: make_fates_domain+surface.Rmd https://github.com/mpaiao/FATES_Utils/blob/master/make_single_site/make_fates_domain%2Bsurface.Rmd
- Shell script to submit single-point cases (it should work for CLM and ELM, but not thoroughly tested): create_case_hlm-fates.sh https://github.com/mpaiao/FATES_Utils/blob/master/create_case/create_case_hlm-fates.sh .
Note: the current default reference data used for the surface data file is not compatible with ELM (e.g., it doesn't have all the variables needed). I am still looking for a better reference.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NGEET/fates/issues/754#issuecomment-869380499, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGBXLIK2IOEHBVKEXRAJTADTVAFXTANCNFSM47CSAHTQ .
-- Adam Hanbury-Brown PhD Student Energy and Resources Group UC Berkeley
I am new to FATES and I am trying to create some test simulations for a single site. I am able to successfully create the case, however the simulations are crashing right at the beginning, even before FATES is called.
For reference, I used this shell script to create the case (simple simulation for BCI), and I got the following error reported in the CESM log:
A few additional attempts:
Does anyone have any ideas or suggestions? Thanks!