brucefan1983 / NEP_CPU

CPU version of NEP
GNU General Public License v3.0
49 stars 15 forks source link

Feature request: OpenMP support. #17

Closed initqp closed 1 year ago

initqp commented 1 year ago

Is it possible to enable OpenMP parallel computing in NEP_CPU? This may speed up the computation in some use cases. Thanks in advance!

brucefan1983 commented 1 year ago

It might be possible, will give a try. You are also welcome to try it out if you know how it should be done. I have very limited knowledge about openMP.

initqp commented 1 year ago

Thanks! However, since I'm not good at parallel programming, maybe I should wait for you or other experts to do it.

brucefan1983 commented 1 year ago

OK, I have played with openMP a little before and I think it is not trivial to speed up whenever there are scatter operations, such as accumulating force/virial for neighbor atoms $j$ in the loop of the central atom $i$.

Could you try the version in the branch of this PR: https://github.com/brucefan1983/NEP_CPU/pull/18 It should speed up by about 2X.

brucefan1983 commented 1 year ago

You are also welcome to comment on the PR.

initqp commented 1 year ago

Ok, I will give the new branch a try.

Results: can confirm a solid 1.75x speedup under the 250-atoms PbTe system. Huge improvement!

brucefan1983 commented 1 year ago

Ok, I will give the new branch a try.

Results: can confirm a solid 1.75x speedup under the 250-atoms PbTe system. Huge improvement!

Thanks for the quick tests. Note that the current interface for the non-LAMMPS part (should be the one you used) does not accept a neighbor list. It calculates the neighbor list every time. Therefore, this interface is not suitable for large systems. Also this interface assumed that the box is periodic in all directions, which you may have noticed. Do you think these should be improved? I think the neighbor list is now the bottleneck!

brucefan1983 commented 1 year ago

For the 250-atom PbTe case:

initqp commented 1 year ago

Thanks for the detailed testing. In general, I have the following viewpoints, although some of them may be unfair:

brucefan1983 commented 1 year ago

I fully agree with you on the three points. Currently only calorine, pynep, and somd have used this interface and I think you all noticed this assumption about the periodic boundary conditions (as there is not input for this).

I will try openMP at some point and keep you informed if there is any progress.

brucefan1983 commented 1 year ago

I have done the easy parts:

image

initqp commented 1 year ago

That's amazing, it seems that the main branch is much faster now! Since the following implementations of OpenMP support would require relatively large refactoring (or you may want to invoke critical sections, which might be slow), I think this could be a less urgent and long-term task. So feel free to close this issue if it hangs for too long.

brucefan1983 commented 1 year ago

Ok I think it can be closed. The refactoring is worth to do but I leave it for the future.