Open ikuki-ikuki opened 2 months ago
Hi @ikuki-ikuki ,
I could not find support for the e3gnn potential format in the LAMMPS documentation. You should follow the instruction guide on the main page: https://github.com/MDIL-SNU/SevenNet?tab=readme-ov-file#installation-for-lammps
Is it currently possible to use the potential functions trained from SevenNet on LAMMPS in a CPU-only environment? Is it possible to use the trained potential functions to complete calculation tasks with ASE?
Both are possible. If SevenNet couldn't find a GPU, it gonna evaluate model with CPU only. If the system you're interest is usually small, CPU is a valid choice. But if you're interested in system that is more than thoundands of atoms or have to run very long MD simulations, you need GPU. CPU will be painfully slow.
hi @YutackPark I face a compile LAMMPS problem with CPU clusters
some_path/lammps_dev/src/pair_e3gnn.cpp:36:10: fatal error: 'cuda_runtime.h' file not found
36 | #include <cuda_runtime.h>
| ^~~~~~~~~~~~~~~~
1 error generated.
make[2]: *** [some_path/lammps_dev/src/pair_e3gnn.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
some_path/lammps_dev/src/pair_e3gnn_parallel.cpp:32:10: fatal error: 'cuda_runtime.h' file not found
32 | #include <cuda_runtime.h>
| ^~~~~~~~~~~~~~~~
1 error generated.
make[2]: *** [some_path/lammps_dev/src/pair_e3gnn_parallel.cpp.o] Error 1
make[1]: *** [CMakeFiles/lammps.dir/all] Error 2
make: *** [all] Error 2
Can you have a little guide? Thanks
Hi @thangckt
I almost forgot to fix this case. It is safe to remove 'cuda_runtime.h' for pair_e3gnn.cpp
. Then, it can run without cuda installed on the system.
Quick fix:
From sevenn/pair_e3gnn/pair_e3gnn.cpp
,
Remove line number 36 (cuda_runtime.h)
Remove, from line number 217 to 229 (print_info if statement)
You can directly modify the file in {path_to_lammps}/src/pair_e3gnn.cpp
and make in the build directory.
It will be patched with other LAMMPS related fixes.
hi @YutackPark
How about file src/pair_e3gnn_parallel.cpp
?
Hi @thangckt
Technically possible but not worth it (it is slow, and shared-memory parallelism has higher priority for CPU). pair_e3gnn_parallel.cpp
uses the cuda_runtime more than debugging, therefore it can not be naively removed.
hi @YutackPark our campus just has CPU clusters with infiniband. And I think many other peoples also have the same issue with limited access to GPUs. Small research groups normally do not have GPU clusters.
Can you make some change? Thank you.
Hi @ikuki-ikuki , ut
I could not find support for the e3gnn potential format in the LAMMPS documentation. You should follow the instruction guide on the main page: https://github.com/MDIL-SNU/SevenNet?tab=readme-ov-file#installation-for-lammps
Is it currently possible to use the potential functions trained from SevenNet on LAMMPS in a CPU-only environment? Is it possible to use the trained potential functions to complete calculation tasks with ASE?
Both are possible. If SevenNet couldn't find a GPU, it gonna evaluate model with CPU only. If the system you're interest is usually small, CPU is a valid choice. But if you're interested in system that is more than thoundands of atoms or have to run very long MD simulations, you need GPU. CPU will be painfully slow.
- Note that we're primarily developed and debugged SevenNet under Linux environment with CLI enabled.
Hi
Thanks for your reply.
I have the same issue as thangckt, the pair_e3gnn_parallel.cpp
have too many errors during setup on CPU.
I kown it maight be very slow on CPUs only. But currently, I don't have better envioonment to finish my work.
Looking forward to you further change,much appreciate
Thanks for the opinions. However, e3gnn_parallel is not suitable for intra-node parallelism (using CPU cores within a single node). Instead, it is suitable for multi-node setup, but achieving intra-node parallelism has a higher priority, of course.
For this purpose, we may start with the OpenMP of a torch, this is a doable option. https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html
Another option is upgrading e3gnn_parallel to work even when nswap
is greater than 6:
https://github.com/MDIL-SNU/SevenNet/blob/c63e79498476effc9727b59fe8aff591e47c59ee/sevenn/pair_e3gnn/comm_brick.cpp#L1074
It happens when the decomposed simulation cell is very small. Personally (as who wrote the code), I think fixing this could be extremely hard and time-consuming.
e3gnn_parallel is not suitable for intra-node parallelism (using CPU cores within a single node). Instead, it is suitable for multi-node setup
I prefer inter-node than intra-node
Can you support for multi-node setup?
Thank you so much
Perhaps it would be difficult for me to solve this problems.But thank you all the same.@YutackPark
@ikuki-ikuki , @thangckt Hi guys, check this out. https://github.com/MDIL-SNU/SevenNet/tree/e3gnn_cpu
I made pair_e3gnn_parallel_cpu.*
and comm_brick_cpu.*
that does not depend on cuda_runtime.h
. I have tested it in my system and it seems fine.
To install,
cp {PATH_TO_SEVENNET}/sevenn/pair_e3gnn/pair_e3gnn_parallel_cpu* {PATH_TO_LAMMPS}/src
cp {PATH_TO_SEVENNET}/sevenn/pair_e3gnn/comm_brick_cpu.cpp {PATH_TO_LAMMPS}/src/comm_brick.cpp
cp {PATH_TO_SEVENNET}/sevenn/pair_e3gnn/comm_brick_cpu.h {PATH_TO_LAMMPS}/src/comm_brick.h
and build as usual. do not copy pair_e3gnn_parallel.*
into LAMMPS source.
If build was successful, you should able to see new pair style e3gnn/parallel_cpu
from -help
{LAMMPS_BINARY} -help | grep e3gnn
e3gnn/parallel_cpu
For performance, the maximum number of mpi process is determined by the system size. Therefore you should fill the gap using OMP_NUM_THREADS
environment variable, to utilize CPU cores as much as possible.
For example, in example_inputs/md_parallel_example/in.lmp
, the system has around 700 atoms and 6 or 8 is the maximum MPI process you can use. If the system becomes smaller, the maximum MPI process also decreases.
I also recommend comparing results with LAMMPS serial (I have checked it and it was fine, but this feature is new).
pair_style
as e3gnn/parallel_cpu
in a LAMMPS script.If it does not bother you, please give any comment or suggestions or bug report!
hi @YutackPark
Thank you so much for your help. I will check it
Dear Developers, Thanks for sharing. I am a beginner learning about machine learning potentials, and my current computational resources do not support setting up a CUDA environment. Additionally, I could not find support for the e3gnn potential format in the LAMMPS documentation. Is it currently possible to use the potential functions trained from SevenNet on LAMMPS in a CPU-only environment? Is it possible to use the trained potential functions to complete calculation tasks with ASE?
Thank you so much.Looking forward to your reply.