Closed ricsonc closed 4 years ago
Hi!
I am not 100% sure, but I don't think that having the kernel return D
fields would work. I didn't have it in mind when I wrote pybbfmm, so unless the planets have aligned I think it'll error. Thankfully yep, you should be able to reuse the orthantree preseolve.
You should be able to use CUDA streams to parallelise the solves for each dimension, but I haven't tested this. You might also run into memory problems.
Yep, the charges just scale the interactions.
Just realised that while I cite some useful resources for this in the code's comments, I don't do so in the readme.
- William & Fong's original paper on the black-box fast multipole method is the best place to start.
- Carrier, Greengard & Rokhlin's original paper on the fast multipole method is still the best resource on implementing dynamic trees. Their notation is used extensively in the
orthantree
module.- I've made an animated explanation of the fast multipole method to go with this library.
And, uh, now that someone's going to read the code I'll try to find some time today to write some more documentation.
Thanks! Actually I did come here from your fast multipole tutorial.
By the way, one question about recomputing / partially recomputing trees when particles move -- do you know of any principled approach for how to do this? I have the intuition that it's ok to keep using the same tree, as long as particles do not move too far outside of their box. I don't think I found any mention of this on a quick skim of articles you linked either.
Anyway it's probably not that important, as I timed it and constructing the tree doesn't take too long compared to solving, so there's not too much time savings to be had either way. I also found that I can trade off less time in the tree building for more time in the solving phase by adjusting the capacity of the leaf nodes -- this also reduces memory requirements -- I can fit ~18M instead of ~14M by using capacity=16 (versus the default of 8) -- at the cost of about 20% slower runtime for the same number of particles.
I definitely remember seeing approaches that came down to 'check which points have left their box and regenerate only those bits of the tree', but I'm afraid I didn't keep notes.
If you do want to let points wander out of their boxes, you'll probably want to replace the boundary assertions in the Chebyshev
class with clamps.
As far as optimizing goes, CUDA_LAUNCH_BLOCKING=1
plus snakeviz plus pytorch_memlab were invaluable when I was trying to profile things. There are definitely lots of places you can speed things up still, especially if you've got a different use-case to the one I was after.
I got most of the new docs done today, should be able to finish them off tonight or tomorrow.
Hi, i'm very interested in playing around with this, but I had two small questions:
It appears that a kernel takes
a
of shapeNx1xD
andb
of1xMxD
and returns aNxM
tensor of interactions / forces. Is it possible to have a multi-dimensional output? Say, for a n-body particle simulation, one would want to computeNxMxD
force vectors. (Of course, I could always runD
kernels, one for each dimension, but this seems inefficient). If this isn't possible, it would be valid to reuse the orthantree from presolve for each kernel, right?The charges simply scale the interaction between two objects by the product of their charges, right? Wanted to make sure I was understanding this correctly.
Thanks!