Open aekiss opened 2 years ago
@aekiss these changes do not work for me. Without a module load emsf
I get:
/g/data/ik11/inputs/access-om2/bin/ESMF_RegridWeightGen_f536c3e12d: error while loading shared libraries: libesmf.so: cannot open shared object file: No such file or directory
And with it I get a seg fault:
[gadi-login-09:4180735:0:4180735] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x4f7ae4)
==== backtrace (tid:4180735) ====
0 0x0000000000012c20 .annobin_sigaction.c() sigaction.c:0
1 0x00000000013d6786 esmf_vmmod_mp_esmf_vmbroadcasti4_() ???:0
2 0x0000000001358435 esmf_regridweightgenmod_mp_esmf_regridweightgenfile_() ???:0
3 0x00000000004113a7 MAIN__() ???:0
4 0x000000000040c6e2 main() ???:0
5 0x0000000000023493 __libc_start_main() ???:0
6 0x000000000040c5ee _start() ???:0
=================================
forrtl: severe (174): SIGSEGV, segmentation fault occurred```
The previous version was working.
My working version is at https://github.com/rmholmes/access-om2/blob/update-esmgrids-preRussCorrection/tools/make_remap_weights.sh - although I've only tested for 1-degree using 1 pe (and not a PBS job.).
I've clearly messed up the linking to libesmf.so. I'll try to fix that.
In the meantime, you could have a go with the version Russ built:
/scratch/v45/raf599/esmf/apps/appsO/Linux.intel.64.openmpi.default/ESMF_RegridWeightGen
I've replaced
/g/data/ik11/inputs/access-om2/bin/ESMF_RegridWeightGen_f536c3e12d
with a copy of /scratch/v45/raf599/esmf/apps/appsO/Linux.intel.64.openmpi.default/ESMF_RegridWeightGen
.
This should work if you don't use conserve2nd (as in the latest update to make_remap_weights.py)
Unfortunately I'm still getting a seg fault. I am in /g/data/e14/rmh561/access-om2/input/ERA-5/
running:
[rmh561@gadi-login-03 ERA-5]$ module purge
[rmh561@gadi-login-03 ERA-5]$ module load python3-as-python
[rmh561@gadi-login-03 ERA-5]$ module load nco
[rmh561@gadi-login-03 ERA-5]$ module use /g/data/hh5/public/modules
[rmh561@gadi-login-03 ERA-5]$ module load conda/analysis3
[rmh561@gadi-login-03 ERA-5]$ module unload openmpi
[rmh561@gadi-login-03 ERA-5]$ module load openmpi/4.0.2
[rmh561@gadi-login-03 ERA-5]$ ../../tools/make_remap_weights.py --accessom2_input_dir /g/data/ik11/inputs/access-om2/input_20201102 --atm_forcing_file /g/data/rt52/era5/single-levels/reanalysis/2t/1980/2t_era5_oper_sfc_19800101-19800131.nc --ocean MOM1 --npes 1 --atm ERA5
['mpirun', '-np', '1', '/g/data/ik11/inputs/access-om2/bin/ESMF_RegridWeightGen_f536c3e12d', '--netcdf4', '-s', '/g/data/e14/rmh561/access-om2/tools/tmp2hjb4snb.nc', '-d', '/g/data/e14/rmh561/access-om2/tools/tmp94pojx32.nc', '-m', 'patch', '-w', '/g/data/e14/rmh561/access-om2/tools/tmp7swcqo63.nc']
[gadi-login-03:2475098:0:2475098] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x7fff1e620560)
==== backtrace (tid:2475098) ====
With the old version that works I have been playing around and don't seem to be able to impact the remapping with what I do in era5_grid.py
. E.g. I add 90, 180 or 360 to x_t
and nothing changes. I need to do more digging.
Thanks for trying. It works for me (via qsub make_remap_weights.sh
) so it's odd that it doesn't work for you.
There's something weird about the build process for Russ' executable that I haven't figured out. But let's worry about that rabbit hole if/when we need to for the 0.1deg, and use the NCI executable for now.
A few notes from the TWG meeting discussion this morning re: ERA-5 flipped latitude:
esmgrids
and make_remap_weights.py
to support a north -> south ordered ERA-5 input could be quite challenging. @aidanheerdegen is going to have a look.If @aidanheerdegen can't work some magic with 2 above, then I think a good approach for now would be to create a copy of just the 1990 and 1991 files (a total of about 250GB) with latitude flipped. This should allow us to test RYF simulations at least.
@nichannah's weights in
/g/data/ik11/inputs/access-om2/input_20210915/common_1deg_era5
appear to have been generated by/g/data/v45/nah599/access-om2/tools/make_remap_weights.*
. These weren't committed, so I've done that here.