firedrakeproject / firedrake

Firedrake is an automated system for the portable solution of partial differential equations using the finite element method (FEM)
https://firedrakeproject.org
Other
512 stars 160 forks source link

Setting up Firedrake on Supercomputers #1779

Closed PeiLiu90 closed 4 years ago

PeiLiu90 commented 4 years ago

I have been struggling with setting up firedrake at the supercomputing center for quite a few days. The supercomputing center is using CentOS7. Now the installation can go through, but the package cannot run. The IT staff from the supercomputing center also cannot solve the problem completely. I am wondering what will be the best way to install firedrake on such systems.

Below is what I did, following the instructions from the IT staff, 1, Create a python venv so that "pip install" can be used.        module load python3/3.7.4_anaconda2019.10        conda create --name inst-firedrake python=3 source activate inst-firedrake

2, Build my own PETSC, because the system PETSC is not compatible with the firedrake installation.     (2.1) Load the corresponding HDF5, NETCDF and PNETCDF package, since the "--download-" flag for these packages cannot be used on batch systems.                  module load hdf5/hdf5-1.10.4-intel-2019-update1-parallel module load pnetcdf/1.11.0 module load netcdf/4.62-intel-2019-update1-serial            These packages were compiled using the same mpi-dir and intel compiler: /panfs/roc/intel/x86_64/2019/parallel_studio_xe_msi/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin.  (2.2) Load other packages,                  module load cmake/3.16.2                  module load flex/2.6.4                   module load libtool    (2.3)  Use the flags from, python3 firedrake-install --show-petsc-configure-options, to configure PETSC.                   ./configure --with-mpi-dir=/panfs/roc/intel/x86_64/2019/parallel_studio_xe_msi/compilers_and_libraries_2019.1.144/linux/mpi/intel64/  --download-make --download-scalapack --download-metis --download-ptscotch --download-hypre --download-superlu_dist --download-mumps --download-chaco --download-ml --download-suitesparse --download-hwloc --download-pastix  --with-c2html=0 --with-debugging=0 --with-shared-libraries=1 --with-fortran-bindings=0 --download-eigen=/panfs/roc/groups/6/calde014/liu01304/download/eigen-3.3.3.tar.bz2  --with-zlib --with-hdf5 --with-pnetcdf --with-netcdf --with-batch --with-cxx-dialect=C++11 (2.4) The configure step went through.                export PETSC_DIR=/panfs/roc/groups/6/calde014/liu01304/test3/petsc  export PETSC_ARCH=arch-linux-c-opt     make all                 make check          There is no error message, seems to be good. 3, Install Firedrake.            python3 firedrake-install --no-package-manager --honour-petsc-dir           --mpicc  /panfs/roc/intel/x86_64/2019/parallel_studio_xe_msi/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpiicc --mpicxx /panfs/roc/intel/x86_64/2019/parallel_studio_xe_msi/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpiicpc  --mpif90 /panfs/roc/intel/x86_64/2019/parallel_studio_xe_msi/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpiifort --mpiexec /panfs/roc/intel/x86_64/2019/parallel_studio_xe_msi/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpiexec The installation goes through, seems to be successful.

4, Run test code. I am using the "helmholtz.py" from, https://www.firedrakeproject.org/demos/helmholtz.py.html, as the test code.          source firedrake/bin/activate          python3 helmholtz.py    The program failed to run, showing error message:           ImportError: /panfs/roc/groups/6/calde014/liu01304/test3/firedrake/lib/python3.8/site-packages/petsc4py/lib/arch-linux-c-opt/PETSc.cpython-38-x86_64-linux-gnu.so: undefined symbol: PetscPartitionerInitializePackage   

Attached is the firedrake-install.log and petsc configure.log. Thank you very much for helping! firedrake-install.log configure.log

wence- commented 4 years ago

This error somehow comes from within petsc4py. Let us try and understand what happened. Can you run this test file with the python from the virtual environment?

from petsc4py import PETSc
print(PETSc.COMM_WORLD.size)

Can you also show the output of:

ldd /panfs/roc/groups/6/calde014/liu01304/test3/firedrake/lib/python3.8/site-packages/petsc4py/lib/arch-linux-c-opt/PETSc.cpython-38-x86_64-linux-gnu.so

Your approach of first building petsc and then pointing the firedrake installer at that build is usually the way to go on supercomputers, and given every intermediate step seemed to run fine, I am a bit confused by the final error.

dham commented 4 years ago

Could this be a petsc version not matching petsc4py version issue?

PeiLiu90 commented 4 years ago

Thank you for helping, wence!

This error somehow comes from within petsc4py. Let us try and understand what happened. Can you run this test file with the python from the virtual environment?

from petsc4py import PETSc
print(PETSc.COMM_WORLD.size)

Running the above test file, I have exactly the same error message. First I "module load" all the packages used in the installation, and "export" the PETSC environment variable, go to the firedrake venv (nothing changed if I go to both the inst-firedrake3 venv and firedrake venv), then run the test file. (firedrake) liu01304@ln0003 [~/test3] % python testpy.py Traceback (most recent call last): File "testpy.py", line 1, in from petsc4py import PETSc File "/panfs/roc/groups/6/calde014/liu01304/test3/firedrake/lib/python3.8/site-packages/petsc4py/PETSc.py", line 3, in PETSc = ImportPETSc(ARCH) File "/panfs/roc/groups/6/calde014/liu01304/test3/firedrake/lib/python3.8/site-packages/petsc4py/lib/init.py", line 29, in ImportPETSc return Import('petsc4py', 'PETSc', path, arch) File "/panfs/roc/groups/6/calde014/liu01304/test3/firedrake/lib/python3.8/site-packages/petsc4py/lib/init.py", line 73, in Import module = import_module(pkg, name, path, arch) File "/panfs/roc/groups/6/calde014/liu01304/test3/firedrake/lib/python3.8/site-packages/petsc4py/lib/init.py", line 58, in import_module with f: return imp.load_module(fullname, f, fn, info) File "/home/calde014/liu01304/.conda/envs/inst-firedrake3/lib/python3.8/imp.py", line 242, in load_module return load_dynamic(name, filename, file) File "/home/calde014/liu01304/.conda/envs/inst-firedrake3/lib/python3.8/imp.py", line 342, in load_dynamic return _load(spec) ImportError: /panfs/roc/groups/6/calde014/liu01304/test3/firedrake/lib/python3.8/site-packages/petsc4py/lib/arch-linux-c-opt/PETSc.cpython-38-x86_64-linux-gnu.so: undefined symbol: PetscPartitionerInitializePackage

Can you also show the output of:

ldd /panfs/roc/groups/6/calde014/liu01304/test3/firedrake/lib/python3.8/site-packages/petsc4py/lib/arch-linux-c-opt/PETSc.cpython-38-x86_64-linux-gnu.so

Below is the output: (firedrake) liu01304@ln0003 [~/test3] % ldd /panfs/roc/groups/6/calde014/liu01304/test3/firedrake/lib/python3.8/site-packages/petsc4py/lib/arch-linux-c-opt/PETSc.cpython-38-x86_64-linux-gnu.so linux-vdso.so.1 => (0x00007fff053fa000) libpetsc.so.3.13 => /panfs/roc/groups/6/calde014/liu01304/test3/petsc/arch-linux-c-opt/lib/libpetsc.so.3.13 (0x00007f5c55d9f000) libmpifort.so.12 => /panfs/roc/intel/x86_64/2019/parallel_studio_xe_msi/compilers_and_libraries_2019.1.144/linux/mpi/intel64/lib/libmpifort.so.12 (0x00007f5c559f2000) libmpi.so.12 => /panfs/roc/intel/x86_64/2019/parallel_studio_xe_msi/compilers_and_libraries_2019.1.144/linux/mpi/intel64/lib/release/libmpi.so.12 (0x00007f5c52765000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f5c52561000) librt.so.1 => /lib64/librt.so.1 (0x00007f5c52359000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f5c5213d000) libimf.so => /panfs/roc/intel/x86_64/2019/parallel_studio_xe_msi/compilers_and_libraries_2019.1.144/linux/compiler/lib/intel64_lin/libimf.so (0x00007f5c51b9d000) libsvml.so => /panfs/roc/intel/x86_64/2019/parallel_studio_xe_msi/compilers_and_libraries_2019.1.144/linux/compiler/lib/intel64_lin/libsvml.so (0x00007f5c501fa000) libirng.so => /panfs/roc/intel/x86_64/2019/parallel_studio_xe_msi/compilers_and_libraries_2019.1.144/linux/compiler/lib/intel64_lin/libirng.so (0x00007f5c4fe88000) libm.so.6 => /lib64/libm.so.6 (0x00007f5c4fb86000) libgcc_s.so.1 => /panfs/roc/msisoft/gcc/8.2.0/lib64/libgcc_s.so.1 (0x00007f5c4f96e000) libintlc.so.5 => /panfs/roc/intel/x86_64/2019/parallel_studio_xe_msi/compilers_and_libraries_2019.1.144/linux/compiler/lib/intel64_lin/libintlc.so.5 (0x00007f5c4f6fc000) libc.so.6 => /lib64/libc.so.6 (0x00007f5c4f32e000) libHYPRE-2.18.2.so => /panfs/roc/groups/6/calde014/liu01304/test3/petsc/arch-linux-c-opt/lib/libHYPRE-2.18.2.so (0x00007f5c4eb89000) libumfpack.so.5 => /panfs/roc/groups/6/calde014/liu01304/test3/petsc/arch-linux-c-opt/lib/libumfpack.so.5 (0x00007f5c4e795000) libklu.so.1 => /panfs/roc/groups/6/calde014/liu01304/test3/petsc/arch-linux-c-opt/lib/libklu.so.1 (0x00007f5c4e552000) libcholmod.so.3 => /panfs/roc/groups/6/calde014/liu01304/test3/petsc/arch-linux-c-opt/lib/libcholmod.so.3 (0x00007f5c4e10f000) libbtf.so.1 => /panfs/roc/groups/6/calde014/liu01304/test3/petsc/arch-linux-c-opt/lib/libbtf.so.1 (0x00007f5c4df0b000) libccolamd.so.2 => /panfs/roc/groups/6/calde014/liu01304/test3/petsc/arch-linux-c-opt/lib/libccolamd.so.2 (0x00007f5c4dcf9000) libcolamd.so.2 => /panfs/roc/groups/6/calde014/liu01304/test3/petsc/arch-linux-c-opt/lib/libcolamd.so.2 (0x00007f5c4daed000) libcamd.so.2 => /panfs/roc/groups/6/calde014/liu01304/test3/petsc/arch-linux-c-opt/lib/libcamd.so.2 (0x00007f5c4d8e0000) libamd.so.2 => /panfs/roc/groups/6/calde014/liu01304/test3/petsc/arch-linux-c-opt/lib/libamd.so.2 (0x00007f5c4d6d3000) libsuitesparseconfig.so.5 => /panfs/roc/groups/6/calde014/liu01304/test3/petsc/arch-linux-c-opt/lib/libsuitesparseconfig.so.5 (0x00007f5c4d4d0000) libsuperlu_dist.so.6 => /panfs/roc/groups/6/calde014/liu01304/test3/petsc/arch-linux-c-opt/lib/libsuperlu_dist.so.6 (0x00007f5c4d0d5000) liblapack.so.3 => /lib64/liblapack.so.3 (0x00007f5c4c978000) libblas.so.3 => /lib64/libblas.so.3 (0x00007f5c4c71f000) libhwloc.so.15 => /panfs/roc/groups/6/calde014/liu01304/test3/petsc/arch-linux-c-opt/lib/libhwloc.so.15 (0x00007f5c4c4b6000) libX11.so.6 => /lib64/libX11.so.6 (0x00007f5c4c178000) libnetcdf.so.13 => /panfs/roc/msisoft/netcdf/4.6.2-intel-2019-update1-serial/lib/libnetcdf.so.13 (0x00007f5c4bdd6000) libpnetcdf.so.3 => /panfs/roc/msisoft/pnetcdf/1.11.0/lib/libpnetcdf.so.3 (0x00007f5c4b2f6000) libhdf5hl_fortran.so.100 => /panfs/roc/msisoft/hdf5/hdf5-1.10.4-intel2019update1-serial/lib/libhdf5hl_fortran.so.100 (0x00007f5c4b0d0000) libhdf5_fortran.so.100 => /panfs/roc/msisoft/hdf5/hdf5-1.10.4-intel2019update1-serial/lib/libhdf5_fortran.so.100 (0x00007f5c4ae75000) libhdf5_hl.so.100 => /panfs/roc/msisoft/hdf5/hdf5-1.10.4-intel2019update1-serial/lib/libhdf5_hl.so.100 (0x00007f5c4ac4c000) libhdf5.so.103 => /panfs/roc/msisoft/hdf5/hdf5-1.10.4-intel2019update1-serial/lib/libhdf5.so.103 (0x00007f5c4a5b5000) libmetis.so => /panfs/roc/groups/6/calde014/liu01304/test3/petsc/arch-linux-c-opt/lib/libmetis.so (0x00007f5c4a32a000) libz.so.1 => /lib64/libz.so.1 (0x00007f5c4a114000) libstdc++.so.6 => /panfs/roc/msisoft/gcc/8.2.0/lib64/libstdc++.so.6 (0x00007f5c49d91000) libifport.so.5 => /panfs/roc/intel/x86_64/2019/parallel_studio_xe_msi/compilers_and_libraries_2019.1.144/linux/compiler/lib/intel64_lin/libifport.so.5 (0x00007f5c49b63000) libirc.so => /panfs/roc/intel/x86_64/2019/parallel_studio_xe_msi/compilers_and_libraries_2019.1.144/linux/compiler/lib/intel64_lin/libirc.so (0x00007f5c498f1000) libquadmath.so.0 => /panfs/roc/msisoft/gcc/8.2.0/lib64/libquadmath.so.0 (0x00007f5c496b1000) libfabric.so.1 => /panfs/roc/intel/x86_64/2019/parallel_studio_xe_msi/compilers_and_libraries_2019.1.144/linux/mpi/intel64/libfabric/lib/libfabric.so.1 (0x00007f5c49478000) /lib64/ld-linux-x86-64.so.2 (0x00007f5c586ee000) libmpicxx.so.12 => /panfs/roc/intel/x86_64/2019/parallel_studio_xe_msi/compilers_and_libraries_2019.1.144/linux/mpi/intel64/lib/libmpicxx.so.12 (0x00007f5c49258000) libiomp5.so => /panfs/roc/intel/x86_64/2019/parallel_studio_xe_msi/compilers_and_libraries_2019.1.144/linux/compiler/lib/intel64_lin/libiomp5.so (0x00007f5c48e70000) libgfortran.so.3 => /lib64/libgfortran.so.3 (0x00007f5c48b4e000) libxcb.so.1 => /lib64/libxcb.so.1 (0x00007f5c48926000) libcurl.so.4 => /lib64/libcurl.so.4 (0x00007f5c486bc000) libcilkrts.so.5 => /panfs/roc/intel/x86_64/2019/parallel_studio_xe_msi/compilers_and_libraries_2019.1.144/linux/compiler/lib/intel64_lin/libcilkrts.so.5 (0x00007f5c4847f000) libifcoremt.so.5 => /panfs/roc/intel/x86_64/2019/parallel_studio_xe_msi/compilers_and_libraries_2019.1.144/linux/compiler/lib/intel64_lin/libifcoremt.so.5 (0x00007f5c480eb000) libXau.so.6 => /lib64/libXau.so.6 (0x00007f5c47ee7000) libidn.so.11 => /lib64/libidn.so.11 (0x00007f5c47cb4000) libssh2.so.1 => /lib64/libssh2.so.1 (0x00007f5c47a87000) libssl3.so => /lib64/libssl3.so (0x00007f5c4782e000) libsmime3.so => /lib64/libsmime3.so (0x00007f5c47606000) libnss3.so => /lib64/libnss3.so (0x00007f5c472d7000) libnssutil3.so => /lib64/libnssutil3.so (0x00007f5c470a7000) libplds4.so => /lib64/libplds4.so (0x00007f5c46ea3000) libplc4.so => /lib64/libplc4.so (0x00007f5c46c9e000) libnspr4.so => /lib64/libnspr4.so (0x00007f5c46a60000) libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f5c46813000) libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f5c4652a000) libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f5c462f7000) libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f5c460f3000) liblber-2.4.so.2 => /lib64/liblber-2.4.so.2 (0x00007f5c45ee4000) libldap-2.4.so.2 => /lib64/libldap-2.4.so.2 (0x00007f5c45c8f000) libssl.so.10 => /lib64/libssl.so.10 (0x00007f5c45a1d000) libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f5c455ba000) libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f5c453aa000) libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f5c451a6000) libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f5c44f8c000) libsasl2.so.3 => /lib64/libsasl2.so.3 (0x00007f5c44d6f000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f5c44b48000) libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f5c44911000) libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f5c446af000) libfreebl3.so => /lib64/libfreebl3.so (0x00007f5c444ac000)

PeiLiu90 commented 4 years ago

Could this be a petsc version not matching petsc4py version issue?

Thanks dham!

The petsc is from, git clone -b maint https://gitlab.com/petsc/petsc.git petsc The most recent version is 3.13.

petsc4py is installed by the firedrake instal script. curl -O https://raw.githubusercontent.com/firedrakeproject/firedrake/master/scripts/firedrake-install

wence- commented 4 years ago

git clone -b maint https://gitlab.com/petsc/petsc.git petsc

Aha, this may be the problem. Can you use https://github.com/firedrakeproject/petsc.git (with branch firedrake)? This one is compatible with the petsc4py that firedrake-install gets.

PeiLiu90 commented 4 years ago

git clone -b maint https://gitlab.com/petsc/petsc.git petsc

Aha, this may be the problem. Can you use https://github.com/firedrakeproject/petsc.git (with branch firedrake)? This one is compatible with the petsc4py that firedrake-install gets.

Thanks! I am going to try this.

PeiLiu90 commented 4 years ago

Aha, this may be the problem. Can you use https://github.com/firedrakeproject/petsc.git (with branch firedrake)? This one is compatible with the petsc4py that firedrake-install gets.

Now I have tried using the firedrake branch of petsc. The installation is again smooth. The test code about petsc4py has an output: 1.

The test code, helmholtz.py, still cannot run. There is another error message.

Traceback (most recent call last): File "helmholtz.py", line 51, in from firedrake import File "/panfs/roc/groups/6/calde014/liu01304/test6/firedrake/src/firedrake/firedrake/init.py", line 54, in from pyop2 import op2 # noqa: F401 File "/panfs/roc/groups/6/calde014/liu01304/test6/firedrake/src/PyOP2/pyop2/init.py", line 4, in from pyop2.op2 import # noqa File "/panfs/roc/groups/6/calde014/liu01304/test6/firedrake/src/PyOP2/pyop2/op2.py", line 42, in from pyop2.sequential import par_loop, Kernel # noqa: F401 File "/panfs/roc/groups/6/calde014/liu01304/test6/firedrake/src/PyOP2/pyop2/sequential.py", line 42, in from pyop2 import base File "/panfs/roc/groups/6/calde014/liu01304/test6/firedrake/src/PyOP2/pyop2/base.py", line 64, in import loopy File "/panfs/roc/groups/6/calde014/liu01304/test6/firedrake/src/loopy/loopy/init.py", line 29, in from loopy.symbolic import ( File "/panfs/roc/groups/6/calde014/liu01304/test6/firedrake/src/loopy/loopy/symbolic.py", line 62, in import islpy as isl File "/panfs/roc/groups/6/calde014/liu01304/test6/firedrake/lib/python3.8/site-packages/islpy/init.py", line 25, in import islpy._isl as _isl File "/panfs/roc/groups/6/calde014/liu01304/test6/firedrake/lib/python3.8/site-packages/islpy/_isl.py", line 15, in from islpy._isl_cffi import ffi ImportError: /panfs/roc/groups/6/calde014/liu01304/test6/firedrake/lib/python3.8/site-packages/islpy/_isl_cffi.abi3.so: undefined symbol: __intel_sse2_strrchr

The output of, ldd /panfs/roc/groups/6/calde014/liu01304/test6/firedrake/lib/python3.8/site-packages/islpy/_isl_cffi.abi3.so

        linux-vdso.so.1 =>  (0x00007ffc37357000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f2abe817000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f2abe449000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f2abef49000)
dham commented 4 years ago

This post appears to indicate that this is a symptom of having built Python with gcc but the module (islpy in this case) with the Intel compiler.

https://community.intel.com/t5/Intel-C-Compiler/Undefined-symbol-intel-sse2-strrchr/td-p/1048869

PeiLiu90 commented 4 years ago

Thanks dham! I am trying to rebuild Python using icc.

PeiLiu90 commented 4 years ago

It seems there is no way for a supercomputer user to make all these packages compatible.

Eventually, we decied to try singularity, using the docker image, https://hub.docker.com/r/firedrakeproject/firedrake. If someone meets a similar situation, this might be the most efficient solution.

First we need to build a singularity sandbox, singularity build --sandbox firedrake/ docker://firedrakeproject/firedrake

Then we can run the code within the sandbox, singularity run --writable firedrake/ source /home/firedrake/firedrake/bin/activate python code.py

In case there is no /home/firedrake directory (possibly because "WARNING: Your current working directory is a symlink and may not be available in container", happened only on the supercomputer but not my personal computer), I used, singularity run --writable --no-home --bind $HOME:/mnt/ $HOME/firedrake/

When submiting a job, it is not obvious how to activate the venv. What I figured out is to modify the environment file in the root directory of the sandbox. Add the line: . /home/firedrake/firedrak/bin/activate (there is a space between . and /) Then we no longer need "source /home/firedrake/firedrake/bin/activate" in the sandbox.

Then in the job srcipt, we can simply use, module load singulartiy singularity run --writable --no-home --bind $HOME:/mnt/ $HOME/firedrake/ python3 /mnt/code.py

One last thing I did not figure out is, mpi is not working, possibly due to the container mpi does not know the system mpi.

This thread might be closed

wence- commented 4 years ago

Thanks for your hard work. I think the singularity docs may have some help with the MPI issue. I think the way they say to do things is to run mpiexec outside. Perhaps see here for details?