easybuilders / easybuild-easyconfigs

A collection of easyconfig files that describe which software to build using which build options with EasyBuild.
https://easybuild.io
GNU General Public License v2.0
383 stars 704 forks source link

NAMD-2.14: Charm++ dependency cannot be installed as configured for non-Intel toolchains #20079

Open rlpitts opened 8 months ago

rlpitts commented 8 months ago

The .eb files for NAMD-2.14 should have ('Charm++', '6.10.2') as part of the list of dependencies, as the previous versions have, and it should be noted that, at the moment, .eb files for Charm++ on open-source toolchains not well maintained. The intel toolchain is not a good option for the users who requested this. For our users, the relevant versions of Charm++ available through EasyBuild are 7 years out of date, which means we will have to install a compatible version the hard way. eb NAMD-2.14-foss-202Xy-mpi.eb --robot --use-existing-modules fails for any X from 0 to 3 and y = a or b, and in all cases the failure is traceable to the inability to build Charm++. The following is a small snippet of the log file detailing the errors (or at least where the deprecation warnings stop and the fatal errors begin):

/usr/include/python3.9/ceval.h:136:37: note: declared here
  136 | Py_DEPRECATED(3.2) PyAPI_FUNC(void) PyEval_ReleaseLock(void);
      |                                     ^~~~~~~~~~~~~~~~~~
Fatal Error by charmc in directory /local/easybuild_milan/build/NAMD/2.14/foss-2023a-mpi/NAMD_2.14_S
ource/charm-6.10.2/mpi-linux-x86_64-mpicxx/tmp
   charmmod.o: file not recognized: File truncated
charmc exiting...
ar: creating ../lib/libckmain.a
../bin/charmc  -optimize -production  -O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -DMPI
CH_IGNORE_CXX_SEEK -fpermissive  -o ../lib/libmpi-mainmodule.a mpi-mainmodule.o
ar: creating ../lib/libmpi-mainmodule.a
Fatal Error by charmc in directory /local/easybuild_milan/build/NAMD/2.14/foss-2023a-mpi/NAMD_2.14_S
ource/charm-6.10.2/mpi-linux-x86_64-mpicxx/tmp/libs/ck-libs/pythonCCS
   Command mpicxx -m64 -fPIC -fPIC -DCMK_GFORTRAN -DMPICH_SKIP_MPICXX -DOMPI_SKIP_MPICXX -I../../../
../bin/../include -D__CHARMC__=1 -DMPICH_IGNORE_CXX_SEEK -I. -O2 -ftree-vectorize -march=native -fno
-math-errno -fPIC -fpermissive -O2 -U_FORTIFY_SOURCE -fno-stack-protector -fno-lifetime-dse -c charm
debug-python.C -o charmdebug-python.o returned error code 1
charmc exiting...
gmake[1]: *** [Makefile:42: charmdebug-python.o] Error 1
gmake[1]: *** Waiting for unfinished jobs....
../bin/charmc  -optimize -production  -O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -DMPICH_IGNORE_CXX_SEEK -fpermissive  -o ../lib/libconv-util.a pup_util.o pup_toNetwork.o pup_toNetwork4.o pup_xlater.o pup_c.o pup_paged.o pup_cmialloc.o ckimage.o ckdll.o ckhashtable.o sockRoutines.o conv-lists.o persist-comm.o mempool.o crc32.o  lz4.o partitioning_strategies.o hilbert.o spanningTree.o cmirdmautils.o
Fatal Error by charmc in directory /local/easybuild_milan/build/NAMD/2.14/foss-2023a-mpi/NAMD_2.14_Source/charm-6.10.2/mpi-linux-x86_64-mpicxx/tmp/libs/ck-libs/pythonCCS
   Command mpicxx -m64 -fPIC -fPIC -DCMK_GFORTRAN -DMPICH_SKIP_MPICXX -DOMPI_SKIP_MPICXX -I../../../../bin/../include -D__CHARMC__=1 -DMPICH_IGNORE_CXX_SEEK -O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -fpermissive -O2 -U_FORTIFY_SOURCE -fno-stack-protector -fno-lifetime-dse -c PythonCCS.C -o PythonCCS.o returned error code 1
charmc exiting...
gmake[1]: *** [Makefile:33: PythonCCS.o] Error 1
gmake[1]: Leaving directory '/local/easybuild_milan/build/NAMD/2.14/foss-2023a-mpi/NAMD_2.14_Source/charm-6.10.2/mpi-linux-x86_64-mpicxx/tmp/libs/ck-libs/pythonCCS'
gmake: *** [Makefile:105: pythonCCS] Error 2
gmake: *** Waiting for unfinished jobs....
../../../../bin/charmc -optimize -production  -O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -DMPICH_IGNORE_CXX_SEEK -fpermissive  -lpthread -o ../../../../lib/libmoduleCkLoop.a CkLoop.o 
ar: creating ../../../../lib/libmoduleCkLoop.a
gmake[1]: Leaving directory '/local/easybuild_milan/build/NAMD/2.14/foss-2023a-mpi/NAMD_2.14_Source/charm-6.10.2/mpi-linux-x86_64-mpicxx/tmp/libs/ck-libs/ckloop'
ar: creating ../lib/libconv-util.a
ar: creating ../lib/libck.a
-------------------------------------------------
Charm++ NOT BUILT. Either cd into mpi-linux-x86_64-mpicxx/tmp and try
to resolve the problems yourself, visit
http://charm.cs.illinois.edu/
for more information. Otherwise, email the developers at charm@cs.illinois.edu
 (at easybuild/tools/run.py:682 in parse_cmd_output)
== 2024-03-11 14:36:27,814 build_log.py:267 INFO ... (took 1 min 23 secs)
== 2024-03-11 14:36:27,814 filetools.py:2012 INFO Removing lock /sw/easybuild_milan/software/.locks/_sw_easybuild_milan_software_NAMD_2.14-foss-2023a-mpi.lock...
== 2024-03-11 14:36:27,814 filetools.py:383 INFO Path /sw/easybuild_milan/software/.locks/_sw_easybuild_milan_software_NAMD_2.14-foss-2023a-mpi.lock successfully removed.
== 2024-03-11 14:36:27,814 filetools.py:2016 INFO Lock removed: /sw/easybuild_milan/software/.locks/_sw_easybuild_milan_software_NAMD_2.14-foss-2023a-mpi.lock
== 2024-03-11 14:36:27,814 easyblock.py:4283 WARNING build failed (first 300 chars): cmd "./build charm++ mpi-linux-x86_64 mpicxx  --with-production --with-numa -j40 '-O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -DMPICH_IGNORE_CXX_SEEK -fpermissive'" exited with exit code 2 and output:
Selected Compiler: mpicxx
Selected Options: 
Creating dir: mpi-linux-x86_64-mpicxx
Cre

Yes, it just cuts off mid-word there. That's part of why we're struggling to solve this issue: we don't even know if the error log is the full message.

branfosj commented 8 months ago

At the end there you are getting the "first 300 chars" of the log message. The whole of the output from cmd "./build charm++ mpi-linux-x86_64 mpicxx --with-production --with-numa -j40 '-O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -DMPICH_IGNORE_CXX_SEEK -fpermissive'" will be earlier in the output log file.

I am suspicious of /usr/include/python3.9/ceval.h:136:37: note: declared here as that is looking at a OS package Python header file.

rlpitts commented 8 months ago

Another thing I've realized: Charm++ 6.2.10 compiles with Python 3.8 but not 3.9. I haven't tried 3.10 or 3.11, but I suspect they will not work either given the age of NAMD-2.14 and Charm++ 6.2.10 installers for non-Intel toolchains.

rlpitts commented 8 months ago

At the end there you are getting the "first 300 chars" of the log message. The whole of the output from cmd "./build charm++ mpi-linux-x86_64 mpicxx --with-production --with-numa -j40 '-O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -DMPICH_IGNORE_CXX_SEEK -fpermissive'" will be earlier in the output log file.

Yes, that's why I copied all the lines above the "Charm++ NOT built" line. That's where the fatal errors were that actually seemed to stop the process. Everything above that is a bunch of deprecation warnings and errors that say "fatal" but didn't stop the program. What you're looking at is the end of the log file in tmp. I'm scrolling back through to double-check, but I'm not sure what else I would be looking for that I didn't already post here.

I am suspicious of /usr/include/python3.9/ceval.h:136:37: note: declared here as that is looking at a OS package Python header file.

What are you suggesting? Sorry, I'm fairly new to this field (self-taught in all CS disciplines prior to the last 5 months). If you mean that I need to preload a compatible version of Python, that works for manual installation of Charm++ (and turns out to be essential), but if I can't use the system Python, I'll need a way to change the config file to specify a different version because it won't let me run eb with any other versions loaded. Or are you saying you expected it to look for a different header file?

rlpitts commented 8 months ago

At the end there you are getting the "first 300 chars" of the log message. The whole of the output from cmd "./build charm++ mpi-linux-x86_64 mpicxx --with-production --with-numa -j40 '-O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -DMPICH_IGNORE_CXX_SEEK -fpermissive'" will be earlier in the output log file.

Yes, that's why I copied all the lines above the "Charm++ NOT built" line. That's where the fatal errors were that actually seemed to stop the process. Everything above that is a bunch of deprecation warnings and errors that say "fatal" but didn't stop the program. What you're looking at is the end of the log file in tmp. I'm scrolling back through to double-check, but I'm not sure what else I would be looking for that I didn't already post here.

OK, I've tried redoing the search from the beginning. These are all the errors and Fatal Errors:

gfortran: error: unrecognized command-line option -auto
Fatal Error by charmc in directory /local/easybuild_milan/build/NAMD/2.14/foss-2023a-mpi/NAMD_2.14_Source/charm-6.10.2/mpi-linux-x86_64-mpicxx/tmp
Command mpif90 -auto -fPIC -I../bin/../include -DMPICH_IGNORE_CXX_SEEK -O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -fpermissive -O2 -c tracef_f.f90 -o tracef_f.o returned error code 1
charmc exiting...
cp: cannot stat 'charm': No such file or directory
Fatal Error by charmc in directory /local/easybuild_milan/build/NAMD/2.14/foss-2023a-mpi/NAMD_2.14_Source/charm-6.10.2/mpi-linux-x86_64-mpicxx/tmp
   Command cp -p charm ../include/ returned error code 1
charmc exiting...
_CXX_SEEK -O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -fpermissive -O2 -c tracef_f.f90 -o tracef_f.o returned error code 1
_CXX_SEEK -O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -fpermissive -O2 -c charmmod.f90 -o charmmod.o returned error code 1
_CXX_SEEK -O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -fpermissive -O2 -c mainf.f90 -o mainf.o returned error code 1
Fatal Error by charmc in directory /local/easybuild_milan/build/NAMD/2.14/foss-2023a-mpi/NAMD_2.14_Source/charm-6.10.2/mpi-linux-x86_64-mpicxx/tmp
   Command mpif90 -auto -fPIC -I../bin/../include -DMPICH_IGNORE_CXX_SEEK -O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -fpermissive -O2 -c pup_f.f90 -o pup_f.o returned error code 1
charmc exiting...
Fatal Error by charmc in directory /local/easybuild_milan/build/NAMD/2.14/foss-2023a-mpi/NAMD_2.14_Source/charm-6.10.2/mpi-linux-x86_64-mpicxx/tmp
   pup_f.o: file not recognized: File truncated
charmc exiting...
  135 | Py_DEPRECATED(3.2) PyAPI_FUNC(void) PyEval_AcquireLock(void);
      |                                     ^~~~~~~~~~~~~~~~~~
PythonCCS.C: In function PyObject* CkPy_print(PyObject*, PyObject*):
PythonCCS.C:62:26: error: PyInt_AsLong was not declared in this scope; did you mean PyLong_AsLong?
   62 |   CmiUInt4 pyReference = PyInt_AsLong(PyDict_GetItemString(dict,"__charmNumber__"));
      |                          ^~~~~~~~~~~~
      |                          PyLong_AsLong

There are at least a dozen errors that look essentially the same as the one above, referring to Py_DEPRECATED(3.2), saying some function was not declared in this scope, and asking "did you mean [similar-looking function]?" so I'm not going to copy all of them.

And then the last fatal error is the one that I posted at the start.

I should probably rename this thread to something like "Charm++ build included with NAMD-2.14 cannot be installed as configured."

branfosj commented 8 months ago

I am suspicious of /usr/include/python3.9/ceval.h:136:37: note: declared here as that is looking at a OS package Python header file.

What are you suggesting? Sorry, I'm fairly new to this field (self-taught in all CS disciplines prior to the last 5 months). If you mean that I need to preload a compatible version of Python, that works for manual installation of Charm++ (and turns out to be essential), but if I can't use the system Python, I'll need a way to change the config file to specify a different version because it won't let me run eb with any other versions loaded. Or are you saying you expected it to look for a different header file?

At the point it is failing it is mentioning a Python header file that is being provided by the OS and not be EasyBuild. I've tested NAMD-2.14-foss-2023a-mpi.eb myself and I do not see it trying to build against Python. So, I suspect that you have more Python development headers installed in the OS and that Charm++ detects these, attempts to build against this Python, and then fails.

Can you search for an equivalent line to this in your build log:

checking "whether Python is installed"... "no"

and see if that has a "yes" at the end?

branfosj commented 8 months ago

So, I found this in the latest Charm++ code related to PythonCCS

# PythonCCS requires Python v2.x and is the only target that uses this

And this matches with the problem being in

PythonCCS.C:62:26: error: PyInt_AsLong was not declared in this scope; did you mean PyLong_AsLong?

as PyInt_AsLong is really old Python.

This leads to the suggestion of adding ('Python', '2.7.18') to the dependencies in NAMD-2.14-foss-2023a-mpi.eb to see if Charm++ builds correctly with that.