andyljones / pybbfmm

Black-box fast multipole method package for Python
MIT License
47 stars 8 forks source link

question about the kernel and the charges #1

Closed ricsonc closed 4 years ago

ricsonc commented 4 years ago

Hi, i'm very interested in playing around with this, but I had two small questions:

  1. It appears that a kernel takes a of shape Nx1xD and b of 1xMxD and returns a NxM tensor of interactions / forces. Is it possible to have a multi-dimensional output? Say, for a n-body particle simulation, one would want to compute NxMxD force vectors. (Of course, I could always run D kernels, one for each dimension, but this seems inefficient). If this isn't possible, it would be valid to reuse the orthantree from presolve for each kernel, right?

  2. The charges simply scale the interaction between two objects by the product of their charges, right? Wanted to make sure I was understanding this correctly.

Thanks!

andyljones commented 4 years ago

Hi!

  1. I am not 100% sure, but I don't think that having the kernel return D fields would work. I didn't have it in mind when I wrote pybbfmm, so unless the planets have aligned I think it'll error. Thankfully yep, you should be able to reuse the orthantree preseolve. You should be able to use CUDA streams to parallelise the solves for each dimension, but I haven't tested this. You might also run into memory problems.

  2. Yep, the charges just scale the interactions.

Just realised that while I cite some useful resources for this in the code's comments, I don't do so in the readme.

And, uh, now that someone's going to read the code I'll try to find some time today to write some more documentation.

ricsonc commented 4 years ago

Thanks! Actually I did come here from your fast multipole tutorial.

By the way, one question about recomputing / partially recomputing trees when particles move -- do you know of any principled approach for how to do this? I have the intuition that it's ok to keep using the same tree, as long as particles do not move too far outside of their box. I don't think I found any mention of this on a quick skim of articles you linked either.

Anyway it's probably not that important, as I timed it and constructing the tree doesn't take too long compared to solving, so there's not too much time savings to be had either way. I also found that I can trade off less time in the tree building for more time in the solving phase by adjusting the capacity of the leaf nodes -- this also reduces memory requirements -- I can fit ~18M instead of ~14M by using capacity=16 (versus the default of 8) -- at the cost of about 20% slower runtime for the same number of particles.

andyljones commented 4 years ago

I definitely remember seeing approaches that came down to 'check which points have left their box and regenerate only those bits of the tree', but I'm afraid I didn't keep notes.

If you do want to let points wander out of their boxes, you'll probably want to replace the boundary assertions in the Chebyshev class with clamps.

As far as optimizing goes, CUDA_LAUNCH_BLOCKING=1 plus snakeviz plus pytorch_memlab were invaluable when I was trying to profile things. There are definitely lots of places you can speed things up still, especially if you've got a different use-case to the one I was after.

I got most of the new docs done today, should be able to finish them off tonight or tomorrow.