Memory Insufficiency Error When Running GaAs Example in HamEPC Project

QuantumLab-ZY / HamEPC

A machine learning workflow for calculating the electron-phonon coupling (EPC)

MIT License

15 stars 3 forks source link

Memory Insufficiency Error When Running GaAs Example in HamEPC Project #6

Open witt2000 opened 3 weeks ago

witt2000 commented 3 weeks ago

I encountered a memory insufficiency error while running the GaAs example from the HamEPC project using the following command: mpirun -np $SLURM_NPROCS HamEPC --config EPC_input.yaml Here’s the error message: File "/data/home/actcc/.conda/envs/HamEPC/lib/python3.9/site-packages/HamEPC/EPC_calculator.py", line 608, in _elec_cal eigen_vecs = np.array(eigen_vecs) # (nk, norbs, norbs) numpy.core._exceptions._ArrayMemoryError: Unable to allocate 68.0 GiB for an array with shape (6750000, 26, 26) and data type complex128 Do you know why this is happening? How can I solve this problem?

Additionally, I am currently using a CPU node with 512GB of memory. Is a GPU required to successfully run HamEPC? My environment includes numpy version 1.21.2 and PyTorch version 1.11.0.

Thanks!

QuantumLab-ZY commented 2 weeks ago

Dear @witt2000,

HamEPC supports hybrid parallelization using both MPI and OpenMP. The total number of CPU cores is equal to the product of the number of threads and the number of processes. You can adjust the configuration by reducing the number of processes and increasing the number of threads accordingly. This approach can help reduce memory usage.

Best regards, Yang Zhong

Vahid999 commented 2 weeks ago

I tried the GaAs mobility calculation using 2 nodes (64 cores each) of 500G/node. I used 64 threads and 1 process per node and the job still ran out of memory. Do we know how much memory is needed for the GaAs mobility calculation?

Thanks, Vahid

Vahid999 commented 1 week ago

In trying to figure out why I get no output for GaAs, I changed the k/q grid size to 30x30x30 just to see if I get any output for the GaAs example and there is no output. I have two questions:

Is there a way to test the installation?
In the Mobility_input.yaml, we have

cal_mode: 'mobility' read_momentum: False

But in the utils.py, it says

'read_momentum': False, # When calculating the carrier mobility, read_momentum should be true.

For mobility calculation, should read_momentum be true or false?

Thanks, Vahid