NGEET / fates-containers

Repository for containerized version of fates for use in future tutorials
6 stars 7 forks source link

CTSM buildexe failure: `-lnetcdf` and `-lnetcdff` not found? #20

Closed glemieux closed 3 years ago

glemieux commented 4 years ago

Both @serbinsh and I are seeing this in different version ctsm-fates builds:

mpif90  -o /home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/cesm.exe cime_comp_mod.o cime_driver.o component_mod.o component_type_mod.o cplcomp_exchange_mod.o map_glc2lnd_mod.o map_lnd2glc_mod.o map_lnd2rof_irrig_mod.o mrg_mod.o prep_aoflux_mod.o prep_atm_mod.o prep_glc_mod.o prep_ice_mod.o prep_lnd_mod.o prep_ocn_mod.o prep_rof_mod.o prep_wav_mod.o seq_diag_mct.o seq_domain_mct.o seq_flux_mct.o seq_frac_mct.o seq_hist_mod.o seq_io_mod.o seq_map_mod.o seq_map_type_mod.o seq_rest_mod.o t_driver_timers_mod.o  -L/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/lib/ -latm  -L/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/lib/ -lice  -L../../gnu/openmpi/nodebug/nothreads/mct/noesmf/lib/ -lclm  -L/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/lib/ -locn  -L/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/lib/ -lrof  -L/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/lib/ -lglc  -L/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/lib/ -lwav  -L/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/lib/ -lesp -L../../gnu/openmpi/nodebug/nothreads/mct/noesmf/c1a1l1i1o1r1g1w1e1/lib -lcsm_share -L../../gnu/openmpi/nodebug/nothreads/lib -lpio -lgptl -lmct -lmpeu   -L/lib/ -lnetcdff -lnetcdf -lcurl -llapack -lblas
/usr/bin/ld: cannot find -lnetcdff
/usr/bin/ld: cannot find -lnetcdf
collect2: error: ld returned 1 exit status
/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/Tools/Makefile:874: recipe for target '/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/cesm.exe' failed
make: *** [/home/fatesuser/output/no-user-test-asroot.fates.docker.Cabcd593-F3248e63/bld/cesm.exe] Error 1

My particular image is a ctsm-fates-gcc650 build with cime5.6.28. Here's the Makefile line: https://github.com/ESMCI/cime/blob/fe16302fc332a02427a9e41a8efe959f2fe8c953/scripts/Tools/Makefile#L873-L874

The gcc650 baseos build hasn't changed and if I recall, the ctsm testrepo ran successfully in the past using that same baseos, so I'm not sure what's going on here. The LD_LIBRARY_PATH includes the paths to the combined C and Fortan netcdf libraries, so the baseos seems to be fine.

serbinsh commented 4 years ago

Correct. and strangely older containers like those here https://hub.docker.com/repository/docker/serbinsh/ctsm_containers/general don't have this same problem. I only noticed this when trying to come up with newer ctsm/fates combos. I have even run into this problem after updating the newer machines files in the cime that ships with latest main_api to reflect those used in the examples and I still have this error. To do this I mapped in a host directory with modified machines files that match that expected by cime but with the docker hostname. but get the same error.....

glemieux commented 4 years ago

Correct. and strangely older containers like those here https://hub.docker.com/repository/docker/serbinsh/ctsm_containers/general don't have this same problem.

Ok, I'm going to do a more thorough comparison of the dockerfiles on your personal repo and the dockerfile that I'm working with right now. I'm pretty sure the baseos images are almost exactly the same as I simply ported them over. For the ctsm-fates image I'm seeing the error on, I was adjusting a lot of things from your original, although the form was largely the same.

glemieux commented 4 years ago

Task list:

glemieux commented 3 years ago

The cause of the issue was due to the config_compiler.xml file having the <slib> entry set for ENV${NETCDF_HOME} which didn't match the environment variable location information in config_machine.xml: https://github.com/glemieux/docker-fates-tutorial/blob/90242388c3d2e3dca8f593696e8ab9ac01b68195/docker/fates-ctsm/cime_config/config_machines.xml#L37-L38

Updating the entries to the file to match @serbinsh original compiler xml file resolved the error. Commit to my personal repo is here: https://github.com/glemieux/docker-fates-tutorial/commit/132c112e8eba7037a6be2fef12483efbabf7830b

I'll close this out when I pull in the fix to the NGEET repo.

serbinsh commented 3 years ago

Great, nice find! I am close to have a test dataset so i will test with this once merged