Open JTaozhang opened 1 day ago
Hi @JTaozhang
From my experience, this problem is from your job submitting scripts and the server setting. but the provided files do not contain any of information about them, please provide them in detail.
I've done parallel computation with HSE functional by using OMP_NUM_THREADS=16 mpirun -np 32 abacus
and it works well
Hi, thanks for your reply. I have attached the submitting scripts here, for the combination of "mpirun -np 8 -env OMP_NUM_THREADS=28 and total cpus is 224(8 nodes, 56 cpus per node)", it works. However, for the "mpirun -np 20 -env OMP_NUM_THREADS=28 and total cpus is 560 (10 nodes, 56 cpus per node)", it fails.
I think ,different machine has different setting, I am not sure you can reproduce my case with your machine. Maybe you can change your combination using my atomic system to check this problem.
one more question, less tasks in a node means that the memory of the node will be less shared by other task, right? The mp_num_thread decides how the cpus are distributed to one task, which governs the parallel computing.
Best, Tao
Describe the bug
Hi there,
Currently, I am working on a WTe2 bilayer systems, which contains about 504 atoms. I try to calculate the band structure with kpoint mesh of 11 along the high symmetric path. The software version is v3.8.2. With the same INPUT setting and KPT settings, but adopting different cpus combinations, one works and another runs abnormal. Specifically it reports nothing, no error and no useful information. one is mpirun -np 8 -env OMP_NUM_THREADS=28 and total cpus is 224(8 nodes, 56 cpus per node), another is mpirun -np 20 -env OMP_NUM_THREADS=28 and total cpus is 560 (10 nodes, 56 cpus per node).
for the abnormal job, the whole outoput information shows below,
I don't know what causes this abnormal behavior, could you test the code? I think the parallel calculation part may still possess some unstable problem. this problem I also dicussed in the wechat online group, somebody suggest me to propose an issue here. So I do this.
related file is here WTe2.zip
Expected behavior
the second submitting setting should work fast than the first setting.
To Reproduce
Environment
module load cmake/cmake-3.25 gnu/12.1.0
source /share/apps/intel2022/setvars.sh source /share/home/zhangtao/software/abacus-develop-3.8.3/toolchain/install/setup
Additional Context
no more information is needed
Task list for Issue attackers (only for developers)