Closed treerink closed 5 years ago
On a fresh checkout and new ece2cmor3 environment create on cca, this error message is absent ...
Can you post the operating system you are using (Fedora version)?
Fedora 26
Even after removing the environment by conda env remove --name ece2cmor3 and a fresh install of ece2cmor this error message persists on my knmi fedora workstation (I haven't encountered it either on mac or on cca) .
Hi all,
I am new on this, so forgive me if I have not followed the full story. I am trying to catching up with the ece2cmor3 thing at CNR. I am experiencing the same issue on Marconi HPC at Cineca. I am not able to cmorize any nemo output for the moment, but I am not sure if this is the main problem.
HPC Traceback (most recent call last):
File "/marconi_work/Pra13_3311/opt/anaconda/envs/ece2cmor3/lib/python2.7/site-packages/ESMF/interface/loadESMF.py", line 122, in <module>
mode=ct.RTLD_GLOBAL)
File "/marconi_work/Pra13_3311/opt/anaconda/envs/ece2cmor3/lib/python2.7/ctypes/__init__.py", line 366, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /marconi_work/Pra13_3311/opt/anaconda/envs/ece2cmor3/lib/libesmf_fullylinked.so: undefined symbol: __netcdf_MOD_nf90_put_var_1d_fourbyteint
Hi @oloapinivad , I have had contact with Kristian Strommen and I think the problem with your nemo files on Marconi was the time axis that had disappeared when the parallel netcdf files from the XIOS server processes were merged, can you confirm this?
Thanks @goord for the reply. Actually the one you mention was one of the issues that together with Kristian we managed to solve. The problem above is still there but it does not seem to be the culprit of my NEMO crases.
Indeed, I have a few extra problems with NEMO (with depth axes, which lead to a crash) and IFS (with Primavera tables, with does not cmorize some high freq variables), I will try to figure it out in the next days: in the worst case I am going to open separated issues.
Also if I remove these lines:
environment.yml
and recreate the environment the error remains on my KNMI workstation. So I assume this loadESMF.py
must be installed automatically somewhere because of a detected dependency?Removing these lines did not matter for checkvars.py
at least.
@treerink According to this comment the issue vanished for @oloapinivad. Do you still have the problem? Can we close this issue?
This issue still persisits on my KNMI fedora workstation, even after a full clean: a new anaconda release, a new ece2cmor3 checkout, environment clean. On ubuntu I do not enclounter this error. The error seems not to have any impact, but appears in all output of the scripts. I gave up about this error, but kept the issue open, because once in a while someone else might experience this.
I understand that time is short and you might prefer not to address this issue, in which case I would still suggest to close it or at least mark it as 'wont_fix' or something like that so that it can easily be ignored.
Having said that, here is another idea for debugging. That symbol is resolved by the libnetcdff
library (note the second f for fortran). Could you do a
ldd /marconi_work/Pra13_3311/opt/anaconda/envs/ece2cmor3/lib/libesmf_fullylinked.so
and confirm that there is a line about netcdff
similar to
libnetcdff.so.6 => /marconi_work/Pra13_3311/opt/anaconda/envs/ece2cmor3/lib/libnetcdff.so.6
? This should tell us whether it finds the right library, but that might not contain the right symbol or it pulls in a wrong library, possibly from a system path.
In my case, yes I have:
/usr/people/reerink/anaconda2/envs/ece2cmor3/lib/libesmf_fullylinked.so
and
/usr/people/reerink/anaconda2/envs/ece2cmor3/lib/libnetcdff.so -> libnetcdff.so.6.1.1*
/usr/people/reerink/anaconda2/envs/ece2cmor3/lib/libnetcdff.so.6 -> libnetcdff.so.6.1.1*
/usr/people/reerink/anaconda2/envs/ece2cmor3/lib/libnetcdff.so.6.1.1*
This looks like the output from ls
. With ldd
we get to know what the dynamic linker considers to be the appropriate libraries to load as dependencies.
This is what you we asking?:
ldd /usr/people/reerink/anaconda2/envs/ece2cmor3/lib/libesmf_fullylinked.so |grep libnetcdff.so.6
libnetcdff.so.6 => /usr/local/free/installed/netcdf_for_fortran_f26/netcdf-fortran-4.4.4_ifort/lib/libnetcdff.so.6 (0x00007f22094c4000)
Exactly! Here, we see that the dynamic linker picked up the wrong netcdf fortran library. Instead of the correct
/usr/people/reerink/anaconda2/envs/ece2cmor3/lib/libnetcdff.so.6
which contains the expected symbol
nm /usr/people/reerink/anaconda2/envs/ece2cmor3/lib/libnetcdff.so.6 |egrep put_var_1d_fourbyteint
0000000000051ca0 T __netcdf_MOD_nf90_put_var_1d_fourbyteint
it is using
/usr/local/free/installed/netcdf_for_fortran_f26/netcdf-fortran-4.4.4_ifort/lib/libnetcdff.so.6
which was compiled by an intel compiler and hence, thanks to these crucial bits of the ABI not being fixed in the fortran specification but left to the compiler implementations, instead has
nm libnetcdff.so.6 |egrep put_var_1d_fourbyteint
00000000000598c0 T netcdf_mp_nf90_put_var_1d_fourbyteint_
ie different name mangling with regards to underscores and mp
instead of MOD
for modules.
So this is a local configuration problem, not an ece2cmor bug and hence can be closed, I think.
To solve your problem: The linker gets its search path from three places
libesmf_fullylinked.so
itself$LD_LIBRARY_PATH
/etc/ld.so.conf
Usually the culprit for this kind of problem is a rogue $LD_LIBRARY_PATH
variable. So you can try
echo $LD_LIBRARY_PATH
to check if it contains something like /usr/local/free/installed/netcdf_for_fortran_f26/netcdf-fortran-4.4.4_ifort/lib
and then
unset LD_LIBRARY_PATH
ece2cmor
to see if this way the error vanishes. Of course the problem is that you might actually need that version of netcdf for something else. This is the reason for the widespread module system that takes care of setting and unsetting these variables in a slightly simpler fashion. Indeed, it may well be that also here it has been set by an innocent
module load netcdf
in your .bashrc
.
Good luck!
Thanks for your input @zklaus. I thought we checked the LD_LIBRARY_PATH no Thomas?
It took me a bit go through all your comment (for this indeed not high priority issue), anyway @zklaus thanks for your guidance. Indeed, after checking
echo $LD_LIBRARY_PATH
finally I vaguely remembered I had adjusted that one in my bash alias file when I started here at KNMI because of some trouble on fedora with netcdf and I was advised to use a fix. Anyway taking out that LD_LIBRARY_PATH setting does the job: including/excluding this LD_LIBRARY_PATH makes the difference with the error. So solved!
I am not sure whether this issue appeared after the last updates or after the recreate of the ece2cmor conda environment:
The error can be reproduced by running: ./check-for-obsolete-cmor-variables-in-json-file.py but applies to checvars.py as well. As far I could judge the results are not affected.