flatironinstitute / FMM3D

Flatiron Institute Fast Multipole Libraries --- This codebase is a set of libraries to compute N-body interactions governed by the Laplace and Helmholtz equations, to a specified precision, in three dimensions, on a multi-core shared-memory machine.
https://fmm3d.readthedocs.io
Other
91 stars 36 forks source link

Calling the vectorized versions from Julia #42

Open mipals opened 11 months ago

mipals commented 11 months ago

Hey there,

I am trying to solve a problem with multiple right-hand sides. In short this means that I need to evaluate the hfmm3d for many different setups of charge strengths (or dipole strengths). I see that there is a vectorized version, however it seems that there is no interface for the function in Julia. Before I go and implement my own I want to know if there is a specific reason why there is no interface (other than it takes time to implement) as well as if there is any computational speed to gain using the vectorized version (currently, I have around 100 rhs, but this could be increased in the future).

Cheers, Mikkel

askhamwhat commented 11 months ago

Hi Mikkel, I think this may be available with the current wrapper. It is possible to pass a keyword argument to hfmm3d which lets the routine know that you are doing a vectorized call. So something like:

  vals = hfmm3d(eps,zk,sources;charges=charges,nd=100)

If sources was 3 x n, then the routine would expect charges to be an nd x n array (and throw an error otherwise).

Let me know if that works! The vectorized call is untested.

I am not sure what to expect for the performance gains. These may be platform dependent. Would be interested to know how it goes.

Cheers, Travis

mrachh commented 11 months ago

With regards to performance gain, we see a factor of 2+ for Helmholtz, and around a factor of 4 for Laplace. However, there is no benefit in cranking up nd. This effect stops pretty much after 16-32 densities. The larger you make nd, the more memory the code would need. So I would recommend calling it in batches of 16-32, whatever the memory on your system allows.

Regards, Manas

mipals commented 11 months ago

Hi again,

Thanks for the quick answers!

@askhamwhat Ahh perfect, I tried looking into the code but I did not realize that calling hfmm3d_ also included the vectorized version (the documentation mentioned something of a _vec which i could not find anywhere). From the documentation it looks as the charges needs to be interleaved in one long vector, but I guess given Julias column-major it is the same as a nd x n matrix. Additionally, the dipoles needs to be interleaved component-wise and not dipole-wise (something that I missed the first time around). After having tried it out there is one thing that annoys me slightly. If you give only one set of charges the function returns a vector (n x 1) but with multiple charges it returns a nd x n matrix. I guess this has to do with the underlying Fortran implementation. Thankfully this is something a simple transpose can fix 😄

@mrachh Sounds about right. For a very crude implementation I got 3-8 speedup (on a M1 mac) for my specific application. However, not everything here can directly be contributed to the vectorized call as the application includes additional computations which can be stacked/improved when calling a vectorized hfmm3d.

Cheers, Mikkel