E3SM-Project / scorpio

A high-level Parallel I/O Library for structured grid applications
18 stars 16 forks source link

Scorpio's way of finding netcdf is not robust enough if system version of library exists #590

Closed bartgol closed 3 weeks ago

bartgol commented 1 month ago

I was getting odd link errors on my laptop. In particular, pioc was linking to a system version of netcdf, even though I was correctly pointing to a module installation. I could see that the correct installation was picked up during cmake:

Found NetCDF: /projects/sems/install/rhel8-x86_64/sems/tpl/netcdf-c/4.7.4/gcc/10.1.0/openmpi/4.0.5/qgvycys/lib/libnetcdf.so (found suitable version "4.7.4./*!<", minimum required is "4.4.0") 

and all the vars in CMakeCache.txt pointed to the module installation. However, ldd showed /lib64/libnetcdf.so as the netcdf library being linked by the pioc lib. I digged a bit, and found that CMakeFiles/pioc.dir/link.txt contained -lnetcdf without any -L/path/to/lib entry. This quickly pointed me to CMake policy 0060, which deals with how libraries are linked:

For CMake<3.3, OLD is the default behavior, while for CMake>=3.3 NEW is the default behavior. Since scorpio still uses cmake_minimum_required (VERSION 2.8.12) (something we really need to change), the default behavior is the OLD one. You can see more about this via cmake --help-policy CMP0060.

I see a few options:

Honestly, I think the first option is the best. Being backward compatible with cmake 2 is a big burden and probably pointless. First, I'm not even sure cmake 2.8.12 would work (we may be using cmake 3.X features anyways, I haven't checked). Second, I don't think major supercomputers even have a cmake 2.X installation, they likely all ship 3.X, and hopefully with X>>0. Finally, Recent cmake versions offer more robustness in bug-prone stuff (such as tpl handling), as well as in packaging a project as a CMake package.

I know that this happening on my laptop is not an urgent matter. However, it's not hard to imagine a system where a netcdf installation exists (perhaps not a parallel one), maybe as a dependency when another package was installed. Or maybe someone has a (bad) netcdf installation in their (conda) env, which is not the one that e3sm expects. While things may be going ok right now, hoping that the linker picks up the correct installation can be dangerous. Since updating CMake min version fixes things (and has other benefits), I would strongly encourage to go down that path...

jayeshkrishna commented 1 month ago

@dqwu : Can you update the min version of CMake needed for SCORPIO (We need to make sure that all E3SM machines have that version of CMake available - and use them)?

dqwu commented 4 weeks ago

@jayeshkrishna Actually, this issue is also mentioned in #443 and CMakeLists.txt was already updated to require 3.7 or higher in PR #466: cmake_minimum_required (VERSION 3.7)

However, the min version required is still 2.8.12 in the following CMakeLists.txt files: src/clib/CMakeLists.txt src/flib/CMakeLists.txt src/flib_legacy/CMakeLists.txt src/gptl/CMakeLists.txt tests/CMakeLists.txt

Shall we update these files as well to require 3.7 or higher?

jayeshkrishna commented 4 weeks ago

Yes, please go ahead and update them too

bartgol commented 4 weeks ago

@jayeshkrishna Actually, this issue is also mentioned in #443 and CMakeLists.txt was already updated to require 3.7 or higher in PR #466: cmake_minimum_required (VERSION 3.7)

However, the min version required is still 2.8.12 in the following CMakeLists.txt files: src/flib/CMakeLists.txt src/flib_legacy/CMakeLists.txt src/gptl/CMakeLists.txt tests/CMakeLists.txt

Shall we update these files as well to require 3.7 or higher?

I see 2.8.12 also in src/clib/CMakeLists.txt, which is the one that was causing me problems.

Thanks for the quick response guys!