Closed xylar closed 1 year ago
I tested a out the following on Perlmutter:
mamba remove -y --all -n e3sm_diags_dev
mamba env create -n e3sm_diags_dev -f conda-env/ci.yml
mamba activate e3sm_diags_dev
python -m pip install .
Then, on a compute node, I ran:
#!/usr/bin/env bash
source /global/homes/x/xylar/mambaforge/etc/profile.d/conda.sh
source /global/homes/x/xylar/mambaforge/etc/profile.d/mamba.sh
mamba activate e3sm_diags_dev
e3sm_diags lat_lon --no_viewer \
--reference_data_path '/global/cfs/cdirs/e3sm/diagnostics/observations/Atm/climatology' \
--test_data_path '/global/cfs/cdirs/e3sm/e3sm_diags/postprocessed_e3sm_v2_data_for_e3sm_diags/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis/climatology/rgr/' \
--results_dir '/global/homes/x/xylar/tmp/results_conda' --case_id 'GPCP_v3.2' \
--run_type 'model_vs_obs' --sets 'lat_lon' --variables 'PRECT' --seasons 'ANN' \
--ref_name 'GPCP_v3.2'
This had failed for me using the main
branch (see https://github.com/E3SM-Project/zppy/pull/475#issuecomment-1672940309) but ran successfully with this branch.
@chengzhuzhang and @forsyth2, can you do a little more testing on this (assuming you're okay with it) before we make another RC with it? I'm happy to deploy another E3SM-Unified RC (rc12!) tomorrow if you can get me a e3sm_diags 2.9.0rc3
by then.
Thanks @xylar!!
On my testing branch (https://github.com/E3SM-Project/zppy/pull/477), I was able to successfully run zppy -c tests/integration/generated/test_complete_run_chrysalis.cfg
using these changes!
I don't think zppy
needs another RC in this case. Once we have another E3SM Diags RC (and then Unified RC), I can do final tests for zppy (and zstash too, which I wouldn't expect to have any issues).
I haven't tested on Perlmutter yet, but it had the same problem as Chrysalis.
Compy's error (https://github.com/E3SM-Project/zppy/pull/475#issuecomment-1670471026) still needs to be fixed before a final release though.
@xylar this is a good news. I will test a full run to see if everything works as expected. Though it is mysterious that having the imports in a setting order upfront fixed things (great job to come up with this fix!). I agree that it is worth while to notify shapely/esmf developers about this issue
it is mysterious that having the imports in a setting order upfront fixed things
That is indeed strange. "# isort:skip"
has been used in e3sm_diags
before though. However, in these cases, it was because we had to run a matplotlib
line first. I.e., it wasn't exclusively the import order causing the issue.
For reference, in e3sm_diags/plot/cartopy/streamflow_plot.py
:
matplotlib.use("Agg")
import matplotlib.colors as colors # isort:skip # noqa: E402
import matplotlib.lines as lines # isort:skip # noqa: E402
import matplotlib.pyplot as plt # isort:skip # noqa: E402
For all appearances: git grep -n "# isort:skip"
This looks good to me! Thanks @xylar!
it is mysterious that having the imports in a setting order upfront fixed things
I think importing esmpy>=8.4.0
first might initiate code that locks up the GIL without releasing it. Since the GIL is locked, any subsequent code that requires the GIL results in a segmentation fault (e.g., shapely multithreading code).
By importing shapely first, shapely can acquire the GIL and release it for esmpy
to use whenever esmpy
needs it. This is just my theory and we should let esmpy
/shapely
people figure it out with a GitHub issue 😂 It might be a lot more complex than that.
This merge fixes a segfault issue related to
shapely >=2.0.0
andesmpy >=8.4.0
, likely caused by multiprocessing. It appears that things work fine as long asshapely
gets imported beforee3smpy
.The proposed fix is simply to import
shapely
, thenesmpy
at the very start in thee3sm_diags
package. This isn't a very elegant fix, so it will be worth exploring better options (perhaps including a bug report to ESMPy and/or shapely developers) in the future. But we need this to work for us sooner than that is likely to work for us.I also propose constraining
esmpy >=8.4.0
so we don't have to deal with the import name change fromESMF
toesmpy
.I also propose to constrain
shapely >=2.0.0,<3.0.0
. There were major changes fromshapely
1.x to 2.x and other E3SM analysis tools like polaris, compass, MPAS-Analysis and geometric_features are all using shapely 2.x at this point.e3sm_diags
only depends on shapely indirectly (e.g. through cartopy) but for consistency in plots, it seems best to constrain shapely. This may help to prevent issues upgrading E3SM-Unified in the future.This is my proposed fix for https://github.com/E3SM-Project/zppy/pull/475