QuantumLab-ZY / HamGNN

An E(3) equivariant Graph Neural Network for predicting electronic Hamiltonian matrix
GNU General Public License v3.0
61 stars 15 forks source link

RuntimeError with band_cal_parallel #18

Open flamingoXu opened 6 months ago

flamingoXu commented 6 months ago

Dear Zhong Yang when I run band_cal_parallel, the program raises a Kpath error,as follow: (GPU) root@bohrium-11924-1124454:~/test/bandpredicton/out/version_0# mpirun -np 24 band_cal_parallel --config band_cal_parallel.yaml Traceback (most recent call last): File "/root/anaconda3/envs/GPU/bin/band_cal_parallel", line 8, in <module> sys.exit(main()) File "/root/anaconda3/envs/GPU/lib/python3.9/site-packages/HamTool/band_cal_parallel.py", line 1304, in main band_cal_nonsoc() File "/root/anaconda3/envs/GPU/lib/python3.9/site-packages/HamTool/band_cal_parallel.py", line 658, in band_cal_nonsoc kpath_seek = KPathSeek(structure = struct) File "/root/anaconda3/envs/GPU/lib/python3.9/site-packages/monty-2024.4.17-py3.9.egg/monty/dev.py", line 172, in decorated raise self.err_cls(self.message) RuntimeError: SeeK-path is required to use the convention by Hinuma et al. here is my band_cal_parallel.yaml filename: test graph_data_path: /root/test/bandpredicton/npz/graph_data.npz hamiltonian_path: /root/test/bandpredicton/out/version_0/prediction_hamiltonian.npy k_path: null label: null nk: 120 # The total number of k points in all k paths nao_max: 26 num_wfns: 3# Export the wave functions in the interval of [VBM-num_wfn, VBM+num_wfn] save_dir: ./ soc_switch: False Ham_type: openmx # openmx or abacus

Also when I define the kpath manually, a new error happened: GPU) root@bohrium-11924-1124454:~/test/bandpredicton/out/version_0# mpirun -np 12 band_cal_parallel --config band_cal_parallel.yaml Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers. Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers. Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers. Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers. Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers. Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers. Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers. Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers. Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers. Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers. Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers. Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers. the paramter

filename: test graph_data_path: /root/test/bandpredicton/npz/graph_data.npz hamiltonian_path: /root/test/bandpredicton/out/version_0/prediction_hamiltonian.npy k_path: [[0.0,0.0,0.0],[0.5,0.0,0.0],[0.33333,0.33333,0.3333],[0.0,0.0,0.0]] label: [G, X, M,G] nk: 120 # The total number of k points in all k paths nao_max: 26 num_wfns: 3# Export the wave functions in the interval of [VBM-num_wfn, VBM+num_wfn] save_dir: ./ soc_switch: False Ham_type: openmx # openmx or abacus

QuantumLab-ZY commented 6 months ago

I suggest you use the serial version of the band_cal command, the input file for band_cal is in the utils_openmx. It seems that band_cal_parallel is not compatible with oneAPI, and you need to use the traditional MKL library in the Intel compiler. By the way, be careful not to forget quotation marks, for example, label: ['G', 'X', 'M', 'G'].

QuantumLab-ZY commented 6 months ago

The version of the MKL library I am using is intel2018u4.

flamingoXu commented 6 months ago

Thank you for your help. After replacing 'band_call_parallel' with 'band_cal', the program seems to be working properly, but it has been running for an hour now and is still not finished. My machine has 64 cores and 256 GB of memory image

flamingoXu commented 6 months ago

Dear Zhong Yang,

I have identified a solution to the bug we discussed. By updating the MKL version in Anaconda to 2024.1.0 and ensuring that the correct versions of the libraries—mkl-fft=1.3.1, mkl-random=1.2.2, and mkl-service=2.4.0—are installed, parallel processing should become accessible. image

QuantumLab-ZY commented 6 months ago

I will try it, thank you.