dmalhotra / pvfmm

A parallel kernel-independent FMM library for particle and volume potentials
http://pvfmm.org
GNU Lesser General Public License v3.0
51 stars 28 forks source link

Odd behaviour of example2 with -q 0 #9

Closed G-071 closed 5 years ago

G-071 commented 5 years ago

When running example2 with the parameters

./examples/bin/example2 -m 2 -q 0 -N 669

the example does not terminate. Instead there is just a single thread constantly allocating memory until the compute node runs out of memory (which in my case would be about 192 GB) and crashes.

When I am running the example like

./examples/bin/example2 -m 2 -q 0 -N 668

this behaviour does not occur. Instead the example terminates almost instantly and needs about 400 MB of memory.

Is this a bug, or am I missing something obvious here? The difference in the memory requirements seems extreme given the two scenarios. I have encountered this behaviour on two separate systems (as I wanted to see whether more memory would fix it).

If it is of any help, here the configuration I used: I am using the latest commit (6cd67bdc77a870e75f879fc1f3a266b5b97b38fd) and have built pvfmm with cuda support:

./configure --with-cuda=/usr/local/cuda

Output of the configuration: pvfmm-lib-configuration.txt

dmalhotra commented 5 years ago

You are running the volume FMM example. In this example, the parameter 'q' is the polynomial order of each volume element which must must be a positive integer (i.e. degree >= 0, since degree=order-1). The example uses a Gaussian function for the density and the program tries to resolve this function to the specified tolerance (default tolerance is 1e-5). For low order polynomials this will require a lot of refinement and for polynomial degree=-1 it will never stop refining which is what you observe. So try using a larger 'q' (between 6 and 12 should be good) ... and maybe also a larger error tolerance if you don't really need 5-digit accuracy.

Now the reason that you don't see this behavior when 'N' is small is the following. The parameter 'N' is used to construct a point cloud (with N points) which guides the initial tree refinement... after this the adaptive tree refinement takes over. If 'N' is too small the initial tree will be very coarse. The function samples taken from this coarse tree (for the adaptive refinement) will completely miss the localized Gaussian function. So library determines that the function is already resolved and it returns from the adaptive refinement step.

Let me know if you still have issues with the code or have any other questions.