flatironinstitute / FMM3D

Flatiron Institute Fast Multipole Libraries --- This codebase is a set of libraries to compute N-body interactions governed by the Laplace and Helmholtz equations, to a specified precision, in three dimensions, on a multi-core shared-memory machine.
https://fmm3d.readthedocs.io
Other
100 stars 38 forks source link

Calling the vectorized versions from Julia #42

Open mipals opened 1 year ago

mipals commented 1 year ago

Hey there,

I am trying to solve a problem with multiple right-hand sides. In short this means that I need to evaluate the hfmm3d for many different setups of charge strengths (or dipole strengths). I see that there is a vectorized version, however it seems that there is no interface for the function in Julia. Before I go and implement my own I want to know if there is a specific reason why there is no interface (other than it takes time to implement) as well as if there is any computational speed to gain using the vectorized version (currently, I have around 100 rhs, but this could be increased in the future).

Cheers, Mikkel

askhamwhat commented 1 year ago

Hi Mikkel, I think this may be available with the current wrapper. It is possible to pass a keyword argument to hfmm3d which lets the routine know that you are doing a vectorized call. So something like:

  vals = hfmm3d(eps,zk,sources;charges=charges,nd=100)

If sources was 3 x n, then the routine would expect charges to be an nd x n array (and throw an error otherwise).

Let me know if that works! The vectorized call is untested.

I am not sure what to expect for the performance gains. These may be platform dependent. Would be interested to know how it goes.

Cheers, Travis

mrachh commented 1 year ago

With regards to performance gain, we see a factor of 2+ for Helmholtz, and around a factor of 4 for Laplace. However, there is no benefit in cranking up nd. This effect stops pretty much after 16-32 densities. The larger you make nd, the more memory the code would need. So I would recommend calling it in batches of 16-32, whatever the memory on your system allows.

Regards, Manas

mipals commented 1 year ago

Hi again,

Thanks for the quick answers!

@askhamwhat Ahh perfect, I tried looking into the code but I did not realize that calling hfmm3d_ also included the vectorized version (the documentation mentioned something of a _vec which i could not find anywhere). From the documentation it looks as the charges needs to be interleaved in one long vector, but I guess given Julias column-major it is the same as a nd x n matrix. Additionally, the dipoles needs to be interleaved component-wise and not dipole-wise (something that I missed the first time around). After having tried it out there is one thing that annoys me slightly. If you give only one set of charges the function returns a vector (n x 1) but with multiple charges it returns a nd x n matrix. I guess this has to do with the underlying Fortran implementation. Thankfully this is something a simple transpose can fix 😄

@mrachh Sounds about right. For a very crude implementation I got 3-8 speedup (on a M1 mac) for my specific application. However, not everything here can directly be contributed to the vectorized call as the application includes additional computations which can be stacked/improved when calling a vectorized hfmm3d.

Cheers, Mikkel