Open jimmielin opened 7 months ago
What I've found is that HCO_CO
and HCO_NH3
match bit-for-bit (files are identical) between singlecore and mpi (2 cores), but not HCO_NO
. This is because HCO_NO
applies another field, EDGAR_TODNOX
to it, and this field is somehow different in the two runs.
In singlecore
's cesm.log
for EDGAR_TODNOX
field:
0: hcdebug: (edgar, i= 1 ) -9.9999998E+30 -9.9999998E+30 -9.9999998E+30
0: -9.9999998E+30 -9.9999998E+30 -9.9999998E+30 -9.9999998E+30 1.302642
0: 1.306328 1.306328 1.306328 1.306328 1.306328
0: 1.306328 1.306328 1.319497 1.210987 1.210987
0: 1.020795
0: hcdebug: (edgar, i= 2 ) -9.9999998E+30 -9.9999998E+30 -9.9999998E+30
0: -9.9999998E+30 -9.9999998E+30 -9.9999998E+30 -9.9999998E+30 1.361859
0: 1.361859 1.361859 1.361859 1.372795 1.372795
0: 1.372795 1.372795 1.391346 1.385491 1.385491
0: 1.020795
In mpi
, note how in both CPUs 0:
and 1:
, the first 5 grid boxes have the fill value in:
0: hcdebug: (edgar, i= 1 ) -9.9999998E+30 -9.9999998E+30 -9.9999998E+30
0: -9.9999998E+30 -9.9999998E+30 -9.9999998E+30 -9.9999998E+30 1.302642
0: 1.306328 1.306328
1: hcdebug: (edgar, i= 1 ) -9.9999998E+30 -9.9999998E+30 -9.9999998E+30
1: -9.9999998E+30 -9.9999998E+30 1.374258 1.210987 1.210987
1: 1.019067
0: hcdebug: (edgar, i= 2 ) -9.9999998E+30 -9.9999998E+30 -9.9999998E+30
0: -9.9999998E+30 -9.9999998E+30 -9.9999998E+30 -9.9999998E+30 1.361859
0: 1.361859 1.361859
1: hcdebug: (edgar, i= 2 ) -9.9999998E+30 1.405800 1.405800
1: 1.405800 1.405800 1.405477 1.385491 1.385491
1: 1.019067
For HCO_NO
emissions at surface:
0: hcdebug: writing out lvl-sfc at present dt
0: hcdebug: (i= 1 ) 0.000000000000000E+000 0.000000000000000E+000
0: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0: 0.000000000000000E+000 0.000000000000000E+000 2.544706480154566E-014
0: 2.170436195605955E-014 3.528874920258346E-017 0.000000000000000E+000
0: 1.880868963362995E-016 1.772452777726410E-015 0.000000000000000E+000
0: 3.712318333349080E-013 2.388847495132503E-014 3.953244476404797E-014
0: 0.000000000000000E+000 0.000000000000000E+000
In the MPI configuration:
0: hcdebug: (i= 1 ) 0.000000000000000E+000 0.000000000000000E+000
0: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0: 0.000000000000000E+000 0.000000000000000E+000 2.544706480154566E-014
0: 2.170436195605955E-014 3.528874920258346E-017
1: hcdebug: (i= 1 ) 0.000000000000000E+000 1.439813645198005E-016
1: 1.356820567806388E-015 0.000000000000000E+000 2.841796369544945E-013
1: 2.487986842177408E-014 3.953244476404797E-014 0.000000000000000E+000
1: 0.000000000000000E+000
Note how the numbers in the first CPU match bit-for-bit but not in the second one. The values in i=1 should be contiguous starting from CPU 0 to CPU 1, as the values in the only CPU in the single core result.
The domain decomposition for 10x15 domain is: Global domain size: 24 longitudes, 19 latitudes (24x19)
For 2 CPUs, the domain is chopped horizontally in the middle where each CPU covers all longitudes but half of the latitudes.
The print outputs are directly from HCO_GetPtr
or from %Emis%Val
so they're from HEMCO upstream code and before it hits the regridder. So I feel there is a bug somewhere upstream in HEMCO or in Map_A2A but I have not looked into the rabbit hole of hco_readlist_mod
and hcoio_read_std_mod
and map_a2a
yet.
I've tried shifting the longitude edges for the HEMCO grid in hco_esmf_grid.F90
as having -187.5 as a starting value (it's equal to -180 minus 15/2) seemed sketchy to me. But either using -180.0 as the leftmost edge or -187.5 only makes numerical differences (expected, since the grid is now different to HEMCO) but does not make the difference between 1 core and 2 cores disappear.
This issue thread serves to note the reproducibility issues in HEMCO within CESM2 which should eventually be fixed for: https://github.com/ESCOMP/CAM/issues/856
For the purposes of debugging HEMCO_CESM, it is suggested to use CAM-chem compsets (e.g.,
FCnudged
,FCclimo2010
, ...) beuse CAM-chem is known to be b4b reproducible and GEOS-Chem compsets are likely not. The responsibility of this issue is to ensure that the physics buffer and history fields (e.g.,HCO_NO
,HCO_NH3
,HCO_CO
, ...) match bit-for-bit in restart, different MPI decomp, and different OpenMP threading scenarios.Test/debug workflow
This setup will help debug the issues.
ESCOMP/CESM
).cesm2_3_alpha17c
was used here but any release with HEMCO (post-cam6_3_118
) should do../manage_externals/checkout_externals
hplin/debug_parallel
fromjimmielin/HEMCO_CESM
forcomponents/cam/src/hemco
may be useful, as it has some debug printouts which will appear incesm.log.
./create_newcase --case ~/2403_dev_hco_2.3/2403_dev_hco_2.3-f10_singlecore --compset FC2010climo_HCO --res f10_f10_mg37 --run-unsupported --mach derecho --project UHAR0022
-- thef10_f10_mg37
resolution is 10x15 degree and coarse enough to run on 1 core. I suggest usingFC2010climo
or something that is notFCnudged
so configuring nudging / met fields can be avoided inuser_nl_cam
.cd
to case directory,./xmlchange NTASKS=1
for single core orNTASKS=2
for two cores, etc. In the 10x15 case,NTHRDS=1
(I have not successfully ran with more than 1 thread on this grid)./case.setup --reset
, then filluser_nl_cam
with:The
/glade/u/home/hplin/2403_dev_hco_2.3/HEMCO_Config.CC.TestOnly.c240331.rc
test config file only has CEDS withNO
CO
andNH3
withNO
having a 1x1 gridded scale factor. This makes it easier to debug and much quicker to run../case.build -v
numactl
(https://github.com/NCAR/mpibind/issues/5) - editenv_batch.xml
and change the command in<directive gpu_enabled="false">
to always request 128 cores from the scheduler (it was{{ max_tasks_per_node}}
-> to128
):env_run.xml
:RUN_STARTDATE=2016-01-01
,STOP_OPTION=nhours
,STOP_N=3
(shorter may not work due to coupling intervals)Debugging output is in
cesm.log.*
and organized per CPU.The
cprnc
tool is very useful to compare two netCDF files for bit-for-bit matches: I use this in my.zshrc
Usage:
cprnc <file1> <file2>