MDIL-SNU / SevenNet

SevenNet - a graph neural network interatomic potential package supporting efficient multi-GPU parallel molecular dynamics simulations.
https://pubs.acs.org/doi/10.1021/acs.jctc.4c00190
GNU General Public License v3.0
133 stars 17 forks source link

Running LAMMPS Simulations on CPUs Only #91

Open ikuki-ikuki opened 2 months ago

ikuki-ikuki commented 2 months ago

Dear Developers, Thanks for sharing. I am a beginner learning about machine learning potentials, and my current computational resources do not support setting up a CUDA environment. Additionally, I could not find support for the e3gnn potential format in the LAMMPS documentation. Is it currently possible to use the potential functions trained from SevenNet on LAMMPS in a CPU-only environment? Is it possible to use the trained potential functions to complete calculation tasks with ASE?

Thank you so much.Looking forward to your reply.

YutackPark commented 2 months ago

Hi @ikuki-ikuki ,

I could not find support for the e3gnn potential format in the LAMMPS documentation. You should follow the instruction guide on the main page: https://github.com/MDIL-SNU/SevenNet?tab=readme-ov-file#installation-for-lammps

Is it currently possible to use the potential functions trained from SevenNet on LAMMPS in a CPU-only environment? Is it possible to use the trained potential functions to complete calculation tasks with ASE?

Both are possible. If SevenNet couldn't find a GPU, it gonna evaluate model with CPU only. If the system you're interest is usually small, CPU is a valid choice. But if you're interested in system that is more than thoundands of atoms or have to run very long MD simulations, you need GPU. CPU will be painfully slow.

thangckt commented 2 months ago

hi @YutackPark I face a compile LAMMPS problem with CPU clusters

some_path/lammps_dev/src/pair_e3gnn.cpp:36:10: fatal error: 'cuda_runtime.h' file not found
   36 | #include <cuda_runtime.h>
      |          ^~~~~~~~~~~~~~~~
1 error generated.
make[2]: *** [some_path/lammps_dev/src/pair_e3gnn.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
some_path/lammps_dev/src/pair_e3gnn_parallel.cpp:32:10: fatal error: 'cuda_runtime.h' file not found
   32 | #include <cuda_runtime.h>
      |          ^~~~~~~~~~~~~~~~
1 error generated.
make[2]: *** [some_path/lammps_dev/src/pair_e3gnn_parallel.cpp.o] Error 1
make[1]: *** [CMakeFiles/lammps.dir/all] Error 2
make: *** [all] Error 2

Can you have a little guide? Thanks

YutackPark commented 2 months ago

Hi @thangckt I almost forgot to fix this case. It is safe to remove 'cuda_runtime.h' for pair_e3gnn.cpp. Then, it can run without cuda installed on the system.

Quick fix: From sevenn/pair_e3gnn/pair_e3gnn.cpp, Remove line number 36 (cuda_runtime.h) Remove, from line number 217 to 229 (print_info if statement)

You can directly modify the file in {path_to_lammps}/src/pair_e3gnn.cpp and make in the build directory.

It will be patched with other LAMMPS related fixes.

thangckt commented 2 months ago

hi @YutackPark How about file src/pair_e3gnn_parallel.cpp ?

YutackPark commented 2 months ago

Hi @thangckt Technically possible but not worth it (it is slow, and shared-memory parallelism has higher priority for CPU). pair_e3gnn_parallel.cpp uses the cuda_runtime more than debugging, therefore it can not be naively removed.

thangckt commented 2 months ago

hi @YutackPark our campus just has CPU clusters with infiniband. And I think many other peoples also have the same issue with limited access to GPUs. Small research groups normally do not have GPU clusters.

Can you make some change? Thank you.

ikuki-ikuki commented 2 months ago

Hi @ikuki-ikuki , ut

I could not find support for the e3gnn potential format in the LAMMPS documentation. You should follow the instruction guide on the main page: https://github.com/MDIL-SNU/SevenNet?tab=readme-ov-file#installation-for-lammps

Is it currently possible to use the potential functions trained from SevenNet on LAMMPS in a CPU-only environment? Is it possible to use the trained potential functions to complete calculation tasks with ASE?

Both are possible. If SevenNet couldn't find a GPU, it gonna evaluate model with CPU only. If the system you're interest is usually small, CPU is a valid choice. But if you're interested in system that is more than thoundands of atoms or have to run very long MD simulations, you need GPU. CPU will be painfully slow.

  • Note that we're primarily developed and debugged SevenNet under Linux environment with CLI enabled.

Hi Thanks for your reply. I have the same issue as thangckt, the pair_e3gnn_parallel.cpp have too many errors during setup on CPU. I kown it maight be very slow on CPUs only. But currently, I don't have better envioonment to finish my work. Looking forward to you further change,much appreciate

YutackPark commented 2 months ago

Thanks for the opinions. However, e3gnn_parallel is not suitable for intra-node parallelism (using CPU cores within a single node). Instead, it is suitable for multi-node setup, but achieving intra-node parallelism has a higher priority, of course.

For this purpose, we may start with the OpenMP of a torch, this is a doable option. https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html

Another option is upgrading e3gnn_parallel to work even when nswap is greater than 6: https://github.com/MDIL-SNU/SevenNet/blob/c63e79498476effc9727b59fe8aff591e47c59ee/sevenn/pair_e3gnn/comm_brick.cpp#L1074 It happens when the decomposed simulation cell is very small. Personally (as who wrote the code), I think fixing this could be extremely hard and time-consuming.

thangckt commented 2 months ago

e3gnn_parallel is not suitable for intra-node parallelism (using CPU cores within a single node). Instead, it is suitable for multi-node setup

I prefer inter-node than intra-node

Can you support for multi-node setup?

Thank you so much

ikuki-ikuki commented 2 months ago

Perhaps it would be difficult for me to solve this problems.But thank you all the same.@YutackPark

YutackPark commented 2 months ago

@ikuki-ikuki , @thangckt Hi guys, check this out. https://github.com/MDIL-SNU/SevenNet/tree/e3gnn_cpu

I made pair_e3gnn_parallel_cpu.* and comm_brick_cpu.* that does not depend on cuda_runtime.h. I have tested it in my system and it seems fine.

To install,

cp {PATH_TO_SEVENNET}/sevenn/pair_e3gnn/pair_e3gnn_parallel_cpu* {PATH_TO_LAMMPS}/src
cp {PATH_TO_SEVENNET}/sevenn/pair_e3gnn/comm_brick_cpu.cpp {PATH_TO_LAMMPS}/src/comm_brick.cpp
cp {PATH_TO_SEVENNET}/sevenn/pair_e3gnn/comm_brick_cpu.h {PATH_TO_LAMMPS}/src/comm_brick.h

and build as usual. do not copy pair_e3gnn_parallel.* into LAMMPS source.

If build was successful, you should able to see new pair style e3gnn/parallel_cpu from -help

{LAMMPS_BINARY} -help | grep e3gnn
e3gnn/parallel_cpu

For performance, the maximum number of mpi process is determined by the system size. Therefore you should fill the gap using OMP_NUM_THREADS environment variable, to utilize CPU cores as much as possible. For example, in example_inputs/md_parallel_example/in.lmp, the system has around 700 atoms and 6 or 8 is the maximum MPI process you can use. If the system becomes smaller, the maximum MPI process also decreases.

I also recommend comparing results with LAMMPS serial (I have checked it and it was fine, but this feature is new).

If it does not bother you, please give any comment or suggestions or bug report!

thangckt commented 2 months ago

hi @YutackPark

Thank you so much for your help. I will check it