Open lizziel opened 1 year ago
To clarify, we would expect numerical noise differences for different compilers, e.g. intel versus GNU. But there should not be systematic bias and diffs should be very small.
I should also note that there is a known small memory leak in GCHP that seems to be from MAPL. I created an issue a couple years ago on this at https://github.com/GEOS-ESM/MAPL/issues/1793. It is small enough that it has not been addressed yet.
@yuanjianz, could you put your raw wind output into subdirectories OutputDir and Restarts? I need the restart files as well as the diagnostics. Thanks!
@lizziel, sure. They are now there. It is now going to November, so I assume raw wind would finish today and raw mass flux probably tomorrow or Thursday.
For the environment, it is a little bit hard to expand the full environment. Because we are using Docker+spack and did not generate module files for all dependencies.
The memory leak GNU(ubuntu20.01):
spack find --loaded
-- linux-ubuntu20.04-skylake_avx512 / gcc@9.4.0 -----------------
gcc@10.2.0
-- linux-ubuntu20.04-skylake_avx512 / gcc@10.2.0 ----------------
cmake@3.26.3 esmf@8.4.2 gettext@0.22.3 hdf5@1.14.3 netcdf-c@4.9.2 netcdf-fortran@4.5.3 openmpi@4.1.1
==> 8 loaded packages
---
spack find
-- linux-ubuntu20.04-skylake_avx512 / gcc@9.4.0 -----------------
autoconf@2.69 bzip2@1.0.8 gdbm@1.23 libiconv@1.17 m4@1.4.18 perl@5.38.0 texinfo@7.0.3
autoconf-archive@2023.02.20 diffutils@3.7 gettext@0.22.3 libsigsegv@2.14 mpc@1.3.1 pkgconf@1.9.5 xz@5.4.1
automake@1.16.5 gawk@5.2.2 gmake@4.2.1 libtool@2.4.7 mpfr@4.2.0 readline@8.2 zlib-ng@2.1.4
berkeley-db@18.1.40 gcc@10.2.0 gmp@6.2.1 libxml2@2.10.3 ncurses@6.4 tar@1.30 zstd@1.5.5
-- linux-ubuntu20.04-skylake_avx512 / gcc@10.2.0 ----------------
autoconf@2.69 flex@2.6.3 libnl@3.3.0 openssh@8.2p1 snappy@1.1.10
automake@1.16.5 gdbm@1.23 libpciaccess@0.17 openssl@3.1.3 sqlite@3.43.2
berkeley-db@18.1.40 gettext@0.22.3 libtool@2.4.7 parallelio@2.6.2 tar@1.30
bison@3.8.2 gmake@4.2.1 libxcrypt@4.4.35 perl@5.38.0 ucx@1.14.1
bzip2@1.0.8 hdf5@1.14.3 libxml2@2.10.3 pkgconf@1.9.5 util-linux-uuid@2.38.1
c-blosc@1.21.5 hwloc@2.9.1 lz4@1.9.4 pmix@4.2.2 util-macros@1.19.3
ca-certificates-mozilla@2023-05-30 libaec@1.0.6 m4@1.4.18 py-docutils@0.20.1 xz@5.4.1
cmake@3.26.3 libbsd@0.11.7 ncurses@6.4 py-pip@23.1.2 zlib-ng@2.1.4
curl@8.4.0 libevent@2.1.12 netcdf-c@4.9.2 py-setuptools@68.0.0 zstd@1.5.5
diffutils@3.7 libfabric@1.14.0 netcdf-fortran@4.5.3 py-wheel@0.41.2
esmf@8.4.2 libffi@3.4.4 nghttp2@1.57.0 python@3.11.6
expat@2.5.0 libiconv@1.17 numactl@2.0.14 rdma-core@41.0
findutils@4.7.0 libmd@1.0.4 openmpi@4.1.1 readline@8.2
==> 89 installed packages
The working intel environment(centos7):
spack find --loaded
==> 8 loaded packages
-- linux-centos7-skylake_avx512 / intel@2020 --------------------
cmake@3.17.5 hdf5@1.12.2 intel-mpi@2020 m4@1.4.16 netcdf-c@4.8.1 netcdf-fortran@4.5.4 pkgconf@0.27.1 zlib@1.2.12
---
spack find
==> 17 installed packages
-- linux-centos7-skylake_avx512 / intel@2020 --------------------
antlr@2.7.7 expat@2.4.8 hdf5@1.12.2 libmd@1.0.4 netcdf-c@4.8.1 udunits@2.2.28
bison@3.0.4 flex@2.5.37 intel-mpi@2020 m4@1.4.16 netcdf-fortran@4.5.4 zlib@1.2.12
cmake@3.17.5 gsl@2.7.1 libbsd@0.11.5 nco@4.9.3 pkgconf@0.27.1
To note that the GNU environment used spack find external
to document some dependencies while Intel environment did not. Also, the Intel environment is installed with Mellanox OFED for MPI while GNU uses libfabric. If you are more interested in the detailed setup, the GNU environment is the official docker image maintained by @yidant with a little bit modification(+hl and + fortran for hdf5 and netcdf-c). @1Dandan's intel docker is built from Compute1-supported base.
I will try to install OFED in the GNU environment to see if it would fix the problem in the future if I have time. GNU.txt
To add for the intel environment, the ESMF version is v8.3.1.
Not sure if this is related, but there was a bug report from former GCST member Will Downs about a bug registering memory in GCHP when using libfabric. https://github.com/geoschem/GCHP/issues/47
Hi @lizziel, the raw wind 1 yr geosfp transport tracer is ready: http://geoschemdata.wustl.edu/ExternalShare/tt-geosfp-c24-raw-wind/
I noticed that by changing to Intel environment, although memory leak disappears and running fast enough at first, the simulation slows down to half speed comparing to Dandan's last time runs. From the time diagnostics, Bracket
in ExtData
takes most of the time. Not sure about why this is happening.
Hi @yuanjianz, I am looking at the results and the diagnostics look off, for both of our runs. The passive tracer restarts compare well, with difference of 1e-6, but I think the diagnostic is getting corrupted. This may explain the slow-down.
Stangely I cannot reproduce the issue. I am doing another run with the new diagnostics turned off.
I ran a 1-month simulation using 14.2.2 and 14.3.1 for GEOS-FP processed files and get identical results except for st80_25 (as expected). I do not see a slow-down. I am trying to make sure that a constant value for every grid box in the monthly mean of passive tracer concentrations makes sense. We do see this in version 14.2.2. I am skeptical given the values in the internal state.
Separate from this issue of constant values for passive tracer, I do see that the raw versus processed bug is fixed.
Hi @lizziel, thanks for the update. You said you found corrupted diagnostics in your run as well. Do you think it was the new diagnostics that caused the performance degradation on my end? And it seems only happening for raw files as well, because my GEOS-IT preprocessed wind fullchem benchmark using 14.3.1 with the new diagnostics GCHPctmLevel*
did not show performance issues.
Hi @yuanjianz, we expect the run with the raw files to perform not as well as using preprocessed files because there are so many files to read and with high frequency. Do you see the same performance issue using 14.3.0 instead of 14.3.1?
Hi @lizziel, thanks for the explanation. I haven't done the one with 14.3.0. I am just curious about the diagnostic corruption you mentioned above. What does it mean? Do you think I should turn off the new diagnostics in 14.3.1 and then rerun a performance test between the two versions?
See https://github.com/geoschem/GCHP/issues/399 for discussion of the suspected Passive Tracer diagnostic issue. I am not going to worry about it much for now since it does not impact mass conservation tests (those use restart files) and is not recently introduced.
Here is the global mass table for passive tracer for @yuanjianz's 2022 GEOS-FP run with raw GMAO fields and using winds in advection:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Global Mass of Passive Tracer in 14.3.1_GEOS-FP_raw_wind
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Date Mass [Tg]
---------- ----------------
2022-01-01 17.6562799006358
2022-02-01 17.6527063054427
2022-03-01 17.6527047860219
2022-04-01 17.6527058698120
2022-05-01 17.6527059098902
2022-06-01 17.6527058070735
2022-07-01 17.6527057721245
2022-08-01 17.6527057131759
2022-09-01 17.6527056042564
2022-10-01 17.6527059860906
2022-11-01 17.6527059325285
2022-12-01 17.6527059325285
Summary
------------------------------
Max mass = 17.6562799006358 Tg
Min mass = 17.6527047860219 Tg
Abs diff = 3575114613.909 g
Pct diff = 0.0202525033 %
NOTE: The last month was not available so I copied Nov.
For comparison, here are results for the same run using procesessed winds. Note that both of these runs use dry pressure in advection.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Global Mass of Passive Tracer in 14.3.1_GEOS-FP_processed_wind
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Date Mass [Tg]
---------- ----------------
2022-01-01 17.6562799006358
2022-02-01 17.6527063301804
2022-03-01 17.6527047587778
2022-04-01 17.6527058170920
2022-05-01 17.6527058567409
2022-06-01 17.6527058000323
2022-07-01 17.6527057656179
2022-08-01 17.6527056823411
2022-09-01 17.6527056193937
2022-10-01 17.6527059680841
2022-11-01 17.6527059053356
2022-12-01 17.6527059056348
Summary
------------------------------
Max mass = 17.6562799006358 Tg
Min mass = 17.6527047587778 Tg
Abs diff = 3575141858.008 g
Pct diff = 0.0202526576 %
Hi @lizziel, the GEOS-FP raw mass flux run is ready now. Please check the link here: http://geoschemdata.wustl.edu/ExternalShare/tt-geosfp-raw-csmf/
Thanks @yuanjianz. Here is the mass conservation table for your mass flux run:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Global Mass of Passive Tracer in 14.3.1_GEOS-FP_raw_mass_fluxes
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Date Mass [Tg]
---------- ----------------
2022-01-01 17.6562799006358
2022-02-01 17.6527118519604
2022-03-01 17.6527119102029
2022-04-01 17.6527118174852
2022-05-01 17.6527117827595
2022-06-01 17.6527118234161
2022-07-01 17.6527118121272
2022-08-01 17.6527117025911
2022-09-01 17.6527117019211
2022-10-01 17.6527120035089
2022-11-01 17.6527118178688
2022-12-01 17.6527118540502
Summary
------------------------------
Max mass = 17.6562799006358 Tg
Min mass = 17.6527117019211 Tg
Abs diff = 3568198714.657 g
Pct diff = 0.0202133178 %
Looks like the mass conservation issue with mass fluxes is fixed with the raw GMAO fields bug fix.
Hi @lizziel @sdeastham, my recent mass flux fullchem benchmark is showing unreasonbale high surface aerosol concentration than wind.
Looking back at Lizzie's previous GEOS-IT C180 mass flux v.s. wind transport tracer simulation, I found it seems to be due to much weaker advection in mass flux runs. Taking SF6 and Rn222 as examples(plots from Lizzie's comparison above):
*plots are annual mean massflux - wind or massflux/wind
My instinct is that only a shift from wind to mass flux won't have such a large effect. And as I can recall Martin et al, GMD, 2022, GCHPv13 paper indicates mass flux should have less dampened mass flux than wind. I wonder your opinion on this, thanks!
Thanks @yuanjianz ! In your last post, are you saying that you think the shift from wind to mass flux should be having a smaller effect than this? That would be my expectation too - but I want to be sure we're on the same page. It does look to me like there has been a substantial reduction in vertical mixing, but the interesting thing is that this is exactly what we would expect. I'm curious - how do the horizontal mass fluxes compare between the wind and mass-flux simulations?
Name and Institution
Name: Lizzie Lundgren Institution: Harvard University
New GCHP feature or discussion
This issue is to discuss current work related to meteorology used in GCHP advection. There are several things that I hope to get into version 14.3.0.
HISTORY.rc
for gridded componentDYNAMICS
instead ofGCHPchem
.Pinging @sdeastham and @1Dandan who will help with this work.