QuantumLab-ZY / HamGNN

An E(3) equivariant Graph Neural Network for predicting electronic Hamiltonian matrix
GNU General Public License v3.0
63 stars 15 forks source link

The MKL dependency error in band_cal_parallel.py #12

Open newplay opened 7 months ago

newplay commented 7 months ago

Dear Yang Zhong,

When I run the band_cal_parallel python script, I encounter the following error:

Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers.

My LD_LIBRARY_PATH is set as follows:

/home/zjlin/intel/oneapi/ccl/2021.10.0/lib/cpu_gpu_dpcpp:/home/zjlin/intel/oneapi/compiler/2023.2.0/linux/compiler/lib/intel64_lin:/home/zjlin/intel/oneapi/compiler/2023.2.0/linux/lib:/home/zjlin/intel/oneapi/compiler/2023.2.0/linux/lib/oclfpga/host/linux64/lib:/home/zjlin/intel/oneapi/compiler/2023.2.0/linux/lib/x64:/home/zjlin/intel/oneapi/dal/2023.2.0/lib/intel64:/home/zjlin/intel/oneapi/debugger/2023.2.0/dep/lib:/home/zjlin/intel/oneapi/debugger/2023.2.0/gdb/intel64/lib:/home/zjlin/intel/oneapi/debugger/2023.2.0/libipt/intel64/lib:/home/zjlin/intel/oneapi/dnnl/2023.2.0/cpu_dpcpp_gpu_dpcpp/lib:/home/zjlin/intel/oneapi/ipp/2021.9.0/lib/intel64:/home/zjlin/intel/oneapi/ippcp/2021.8.0/lib/intel64:/home/zjlin/intel/oneapi/itac/2021.10.0/slib:/home/zjlin/intel/oneapi/mkl/2023.2.0/lib/intel64:/home/zjlin/intel/oneapi/mpi/2021.10.0//lib:/home/zjlin/intel/oneapi/mpi/2021.10.0//libfabric/lib:/home/zjlin/intel/oneapi/mpi/2021.10.0//lib/release:/home/zjlin/intel/oneapi/tbb/2021.10.0/env/../lib/intel64/gcc4.8

The version of the mkl package is 2023.4.0, and the library contains the following files:

-rwxr-xr-x. 1 zjlin zjlin  50122480  6月 13  2023 libmkl_avx2.so.2
-rwxr-xr-x. 1 zjlin zjlin  66659704  6月 13  2023 libmkl_avx512.so.2
-rwxr-xr-x. 1 zjlin zjlin  53003488  6月 13  2023 libmkl_avx.so.2
-rw-r--r--. 1 zjlin zjlin   1244476  6月 13  2023 libmkl_blacs_intelmpi_ilp64.a
lrwxrwxrwx. 1 zjlin zjlin        32  6月 30  2023 libmkl_blacs_intelmpi_ilp64.so -> libmkl_blacs_intelmpi_ilp64.so.2
-rwxrwxr-x. 1 zjlin zjlin    523704  4月 15 13:49 libmkl_blacs_intelmpi_ilp64.so.1
-rwxr-xr-x. 1 zjlin zjlin    495248  6月 13  2023 libmkl_blacs_intelmpi_ilp64.so.2
-rw-r--r--. 1 zjlin zjlin    739070  6月 13  2023 libmkl_blacs_intelmpi_lp64.a
lrwxrwxrwx. 1 zjlin zjlin        31  6月 30  2023 libmkl_blacs_intelmpi_lp64.so -> libmkl_blacs_intelmpi_lp64.so.2
-rwxr-xr-x. 1 zjlin zjlin    304384  6月 13  2023 libmkl_blacs_intelmpi_lp64.so.2
-rw-r--r--. 1 zjlin zjlin   1263716  6月 13  2023 libmkl_blacs_openmpi_ilp64.a
lrwxrwxrwx. 1 zjlin zjlin        31  6月 30  2023 libmkl_blacs_openmpi_ilp64.so -> libmkl_blacs_openmpi_ilp64.so.2
-rwxr-xr-x. 1 zjlin zjlin    504472  6月 13  2023 libmkl_blacs_openmpi_ilp64.so.2
-rw-r--r--. 1 zjlin zjlin    758310  6月 13  2023 libmkl_blacs_openmpi_lp64.a
lrwxrwxrwx. 1 zjlin zjlin        30  6月 30  2023 libmkl_blacs_openmpi_lp64.so -> libmkl_blacs_openmpi_lp64.so.2
-rwxr-xr-x. 1 zjlin zjlin    309480  6月 13  2023 libmkl_blacs_openmpi_lp64.so.2
-rw-r--r--. 1 zjlin zjlin    659092  6月 13  2023 libmkl_blas95_ilp64.a
-rw-r--r--. 1 zjlin zjlin    657388  6月 13  2023 libmkl_blas95_lp64.a
-rw-r--r--. 1 zjlin zjlin    212330  6月 13  2023 libmkl_cdft_core.a
lrwxrwxrwx. 1 zjlin zjlin        21  6月 30  2023 libmkl_cdft_core.so -> libmkl_cdft_core.so.2
-rwxr-xr-x. 1 zjlin zjlin    164936  6月 13  2023 libmkl_cdft_core.so.2
-rw-r--r--. 1 zjlin zjlin 578357890  6月 13  2023 libmkl_core.a
lrwxrwxrwx. 1 zjlin zjlin        16  6月 30  2023 libmkl_core.so -> libmkl_core.so.2
-rwxrwxr-x. 1 zjlin zjlin  74757224  4月 15 13:48 libmkl_core.so.1
-rwxr-xr-x. 1 zjlin zjlin  73808072  6月 13  2023 libmkl_core.so.2
-rwxr-xr-x. 1 zjlin zjlin  42417640  6月 13  2023 libmkl_def.so.2
-rw-r--r--. 1 zjlin zjlin  33272494  6月 13  2023 libmkl_gf_ilp64.a
lrwxrwxrwx. 1 zjlin zjlin        20  6月 30  2023 libmkl_gf_ilp64.so -> libmkl_gf_ilp64.so.2
-rwxr-xr-x. 1 zjlin zjlin  16908096  6月 13  2023 libmkl_gf_ilp64.so.2
-rw-r--r--. 1 zjlin zjlin  39356660  6月 13  2023 libmkl_gf_lp64.a
lrwxrwxrwx. 1 zjlin zjlin        19  6月 30  2023 libmkl_gf_lp64.so -> libmkl_gf_lp64.so.2
-rwxr-xr-x. 1 zjlin zjlin  20686648  6月 13  2023 libmkl_gf_lp64.so.2
-rw-r--r--. 1 zjlin zjlin  45645264  6月 13  2023 libmkl_gnu_thread.a
lrwxrwxrwx. 1 zjlin zjlin        22  6月 30  2023 libmkl_gnu_thread.so -> libmkl_gnu_thread.so.2
-rwxr-xr-x. 1 zjlin zjlin  32660800  6月 13  2023 libmkl_gnu_thread.so.2
-rw-r--r--. 1 zjlin zjlin  42048900  6月 13  2023 libmkl_intel_ilp64.a
lrwxrwxrwx. 1 zjlin zjlin        23  6月 30  2023 libmkl_intel_ilp64.so -> libmkl_intel_ilp64.so.2
-rwxrwxr-x. 1 zjlin zjlin  12914912  4月 15 13:47 libmkl_intel_ilp64.so.1
-rwxr-xr-x. 1 zjlin zjlin  20492584  6月 13  2023 libmkl_intel_ilp64.so.2
-rw-r--r--. 1 zjlin zjlin  48143118  6月 13  2023 libmkl_intel_lp64.a
lrwxrwxrwx. 1 zjlin zjlin        22  6月 30  2023 libmkl_intel_lp64.so -> libmkl_intel_lp64.so.2
-rwxr-xr-x. 1 zjlin zjlin  24271352  6月 13  2023 libmkl_intel_lp64.so.2
-rw-r--r--. 1 zjlin zjlin  89325670  6月 13  2023 libmkl_intel_thread.a
lrwxrwxrwx. 1 zjlin zjlin        24  6月 30  2023 libmkl_intel_thread.so -> libmkl_intel_thread.so.2
-rwxr-xr-x. 1 zjlin zjlin  63598016  6月 13  2023 libmkl_intel_thread.so.2
-rw-r--r--. 1 zjlin zjlin   7446888  6月 13  2023 libmkl_lapack95_ilp64.a
-rw-r--r--. 1 zjlin zjlin   7379552  6月 13  2023 libmkl_lapack95_lp64.a
-rwxr-xr-x. 1 zjlin zjlin  50216352  6月 13  2023 libmkl_mc3.so.2
-rwxr-xr-x. 1 zjlin zjlin  48727472  6月 13  2023 libmkl_mc.so.2
-rw-r--r--. 1 zjlin zjlin  51404406  6月 13  2023 libmkl_pgi_thread.a
lrwxrwxrwx. 1 zjlin zjlin        22  6月 30  2023 libmkl_pgi_thread.so -> libmkl_pgi_thread.so.2
-rwxr-xr-x. 1 zjlin zjlin  37998288  6月 13  2023 libmkl_pgi_thread.so.2
lrwxrwxrwx. 1 zjlin zjlin        14  6月 30  2023 libmkl_rt.so -> libmkl_rt.so.2
-rwxr-xr-x. 1 zjlin zjlin  18568768  6月 13  2023 libmkl_rt.so.2
-rw-r--r--. 1 zjlin zjlin  12244390  6月 13  2023 libmkl_scalapack_ilp64.a
lrwxrwxrwx. 1 zjlin zjlin        27  6月 30  2023 libmkl_scalapack_ilp64.so -> libmkl_scalapack_ilp64.so.2
-rwxrwxr-x. 1 zjlin zjlin   7718648  4月 15 13:49 libmkl_scalapack_ilp64.so.1
-rwxr-xr-x. 1 zjlin zjlin   7718768  6月 13  2023 libmkl_scalapack_ilp64.so.2
-rw-r--r--. 1 zjlin zjlin  12329916  6月 13  2023 libmkl_scalapack_lp64.a
lrwxrwxrwx. 1 zjlin zjlin        26  6月 30  2023 libmkl_scalapack_lp64.so -> libmkl_scalapack_lp64.so.2
-rwxr-xr-x. 1 zjlin zjlin   7728456  6月 13  2023 libmkl_scalapack_lp64.so.2
-rw-r--r--. 1 zjlin zjlin  38044180  6月 13  2023 libmkl_sequential.a
lrwxrwxrwx. 1 zjlin zjlin        22  6月 30  2023 libmkl_sequential.so -> libmkl_sequential.so.2
-rwxrwxr-x. 1 zjlin zjlin  28992400  4月 15 13:48 libmkl_sequential.so.1
-rwxr-xr-x. 1 zjlin zjlin  28392328  6月 13  2023 libmkl_sequential.so.2
-rw-r--r--. 1 zjlin zjlin 852178088  6月 13  2023 libmkl_sycl.a
lrwxrwxrwx. 1 zjlin zjlin        16  6月 30  2023 libmkl_sycl.so -> libmkl_sycl.so.3
-rwxr-xr-x. 1 zjlin zjlin 650498600  6月 13  2023 libmkl_sycl.so.3
-rw-r--r--. 1 zjlin zjlin 112149460  6月 13  2023 libmkl_tbb_thread.a
lrwxrwxrwx. 1 zjlin zjlin        22  6月 30  2023 libmkl_tbb_thread.so -> libmkl_tbb_thread.so.2
-rwxr-xr-x. 1 zjlin zjlin  40662512  6月 13  2023 libmkl_tbb_thread.so.2
-rwxr-xr-x. 1 zjlin zjlin  14827696  6月 13  2023 libmkl_vml_avx2.so.2
-rwxr-xr-x. 1 zjlin zjlin  14295744  6月 13  2023 libmkl_vml_avx512.so.2
-rwxr-xr-x. 1 zjlin zjlin  15872192  6月 13  2023 libmkl_vml_avx.so.2
-rwxr-xr-x. 1 zjlin zjlin   7739640  6月 13  2023 libmkl_vml_cmpt.so.2
-rwxr-xr-x. 1 zjlin zjlin   8745920  6月 13  2023 libmkl_vml_def.so.2
-rwxr-xr-x. 1 zjlin zjlin  14644304  6月 13  2023 libmkl_vml_mc3.so.2
-rwxr-xr-x. 1 zjlin zjlin  14787656  6月 13  2023 libmkl_vml_mc.so.2
drwxrwxr-x. 3 zjlin zjlin        27 12月 12 11:38 locale

I suspect the problem may be related to the libmkl_blacs_openmpi_lp64 library with reference Is it correlated with the mkl version? Because I am using Intel oneAPI and there are no .dll files in the library. Best Regards, TzuChing

QuantumLab-ZY commented 7 months ago

Dear TzuChing,

The directory of the MKL library I used when compiling mpitool was '/opt/compiler/intel2018u4/compilers_and_libraries_2018.5.274/linux/mkl/lib/intel64/'. Everything went smoothly when using this library at runtime. Recently, I recompiled mpitool with the MKL library from oneAPI2023, but I still encountered the error "Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers" at runtime. I haven't found a solution to this problem yet. It seems that currently mpitool can only be used with the traditional Intel compiler's MKL library instead of oneAPI's MKL library.

Best Regards, Yang Zhong

newplay commented 7 months ago

Dear Yang Zhong, Thanks for your reply, I will try use the traditional Intel compiler's mkl to use band_cal_parallel again TzuChing

QuantumLab-ZY commented 6 months ago

Dear TzuChing,

User flamingoXu seems to have found a solution in Issue #18, you can try it out.

Best wishes, Yang Zhong

newplay commented 6 months ago

Dear Yang Zhong,

I attempted to install the MKL dependencies using the following command:

mamba install -c intel mkl=2024.1.0 mkl_fft=1.3.1 mkl_random=1.2.2 mkl-service=2.4.0

However, I encountered the following warning and error:

warning  libmamba Added empty dependency for problem type SOLVER_RULE_UPDATE
Could not solve for environment specs
The following packages are incompatible
├─ mkl 2024.1.0**  is requested and can be installed;
└─ mkl_fft 1.3.1**  is installable with the potential options
   ├─ mkl_fft 1.3.1 would require
   │  └─ python >=3.10,<3.11.0a0  with the potential options
   │     ├─ python [3.10.0|3.10.10|...|3.10.9], which can be installed;
   │     └─ python 3.10.14 would require
   │        └─ libsqlite >=3.45.2,<4.0a0 , which does not exist (perhaps a missing channel);
   ├─ mkl_fft 1.3.1 would require
   │  └─ python >=3.7,<3.8.0a0 , which can be installed;
   ├─ mkl_fft 1.3.1 would require
   │  └─ python >=3.8,<3.9.0a0 , which can be installed;
   ├─ mkl_fft 1.3.1 would require
   │  └─ mkl >=2022.1.0,<2023.0a0 , which conflicts with any installable versions previously reported;
   ├─ mkl_fft 1.3.1 would require
   │  └─ mkl >=2022.0.0,<2023.0a0 , which conflicts with any installable versions previously reported;
   ├─ mkl_fft 1.3.1 would require
   │  └─ mkl >=2023.0.0,<2024.0a0 , which conflicts with any installable versions previously reported;
   ├─ mkl_fft 1.3.1 would require
   │  └─ python >=3.11,<3.12.0a0 , which can be installed;
   └─ mkl_fft 1.3.1 would require
      └─ mkl >=2021.3.0,<2022.0a0 , which conflicts with any installable versions previously reported.

Then, I attempted to install the MKL package without specifying the version:

mamba install -c intel mkl mkl_fft=1.3.1 mkl_random=1.2.2 mkl-service=2.4.0

This installation succeeded with the following versions:

mkl                       2023.2.0            intel_49495    intel
mkl-service               2.4.0           py39hae59892_35    intel
mkl_fft                   1.3.1           py39hcab1719_22    intel
mkl_random                1.2.2           py39hbf47bc3_22    intel

Therefore, I believe this dependency combination is more stable. Best Regards, TzuChing