Closed alfonso2166 closed 2 months ago
What kind of GPUs are you using? Also can you tell us what is the size of the system you are tying to run? Did you make sure to run without domain_decomposition?
Hi there,
I am using GPU units with architecture KEPLER37. Actually, I am just initializing my simulation, my input file looks like this
units metal atom_style atomic atom_modify map yes newton on
and the segmentation error occurs when invoking the lmp executable. I guess it is indeed a memory issue of the cluster I used, since I tried a different cluster and everything works just fine.
All the best, Alfonso
From: Ilyes Batatia @.> Sent: Wednesday, September 18, 2024 2:27 AM To: ACEsuit/mace @.> Cc: Alfonso Castillo Juarez @.>; Author @.> Subject: Re: [ACEsuit/mace] Segmentation fault (Issue #592)
What kind of GPUs are you using? Also can you tell us what is the size of the system you are tying to run? Did you make sure to run without domain_decomposition?
— Reply to this email directly, view it on GitHubhttps://github.com/ACEsuit/mace/issues/592#issuecomment-2357707495, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AXLB6HF6ZH4RKTA2IZYGCVDZXETOJAVCNFSM6AAAAABOHNG4WGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJXG4YDONBZGU. You are receiving this because you authored the thread.Message ID: @.***>
Hi Alfonso, can we close this now? Or is there still a problem to investigate?
Hi,
Yes, we can close this now. Thank you!
All the best, Alfonso
From: wcwitt @.> Sent: Monday, September 23, 2024 7:13 AM To: ACEsuit/mace @.> Cc: Alfonso Castillo Juarez @.>; Author @.> Subject: Re: [ACEsuit/mace] Segmentation fault (Issue #592)
Hi Alfonso, can we close this now? Or is there still a problem to investigate?
— Reply to this email directly, view it on GitHubhttps://github.com/ACEsuit/mace/issues/592#issuecomment-2368038794, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AXLB6HG7FHKARNEQZNG5NCLZYAAXHAVCNFSM6AAAAABOHNG4WGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRYGAZTQNZZGQ. You are receiving this because you authored the thread.Message ID: @.***>
Hi everyone,
I am able to build/compile LAMMPS and MACE for GPU use smoothly with the instructions on the website but when invoking the lmp executable with the following example:
units metal atom_style atomic atom_modify map yes newton on
lmp -k on g 1 -sf kk -in in.lammps
the segmentation error shows up:
[midway3-0298:723348:0:723348] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x440000e8) ==== backtrace (tid: 723348) ==== 0 0x0000000000012b20 .annobin_sigaction.c() sigaction.c:0 1 0x000000000006c1f7 MPI_Comm_rank() ???:0 2 0x00000000008b116b LAMMPS_NS::Universe::Universe() ???:0 3 0x0000000000747e17 LAMMPS_NS::LAMMPS::LAMMPS() ???:0 4 0x000000000040438f main() ???:0 5 0x0000000000023493 __libc_start_main() ???:0 6 0x000000000040453e _start() ???:0
[midway3-0298:723348] Process received signal [midway3-0298:723348] Signal: Segmentation fault (11) [midway3-0298:723348] Signal code: (-6) [midway3-0298:723348] Failing at address: 0x69421179000b0994 [midway3-0298:723348] [ 0] /lib64/libpthread.so.0(+0x12b20)[0x7f5e04e5eb20] [midway3-0298:723348] [ 1] /software/openmpi-4.1.0-el8-x86_64/lib/libmpi.so.40(MPI_Comm_rank+0x37)[0x7f5e0987c1f7] [midway3-0298:723348] [ 2] /project/gagalli/alfonso/Software/myMACE/lammps/mybuild/liblammps.so.0(_ZN9LAMMPS_NS8UniverseC1EPNS_6LAMMPSEi+0xfb)[0x7f5e05b2516b] [midway3-0298:723348] [ 3] /project/gagalli/alfonso/Software/myMACE/lammps/mybuild/liblammps.so.0(_ZN9LAMMPS_NS6LAMMPSC2EiPPci+0xa7)[0x7f5e059bbe17] [midway3-0298:723348] [ 4] /project/gagalli/alfonso/Software/myMACE/lammps/mybuild/lmp[0x40438f] [midway3-0298:723348] [ 5] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7f5e042bb493] [midway3-0298:723348] [ 6] /project/gagalli/alfonso/Software/myMACE/lammps/mybuild/lmp[0x40453e] [midway3-0298:723348] End of error message /var/spool/slurm/d/job23524083/slurm_script: line 38: 723348 Segmentation fault (core dumped) /project/gagalli/alfonso/Software/myMACE/lammps/mybuild/lmp -k on g 1 -sf kk -in in.lammps
I have read that such type of error might be related to memory issues but even after installing everything in my research group folder with tons of memory available I get the same error. These are the modules I used:
LOAD MODULES
module load intel/19.1.1 module load mkl/2023.1 module load cuda/12.2 module load cudnn/9.4.0 module load openmpi/4.1.0 module load gcc/10.2.0 module load python/3.11.9 source ~/.bashrc conda activate /project/gagalli/alfonso/Software/ENVS/myenvX
P.D. I did not have any architecture-related issue when compiling.
Any recommendation would be greatly appreciated.