N-BodyShop / changa

UIUC/PPL version of ChaNGa
http://hpcc.astro.washington.edu/tools/changa.html
GNU General Public License v2.0
44 stars 28 forks source link

Flexible GPU and CPU Usage for Gravity #170

Closed spencerw closed 4 months ago

spencerw commented 5 months ago

Although calculating gravity on the GPU is generally much faster, doing so introduces a significant amount of extra overhead. When using multistepping, it's often the case that using the GPU for gravity on higher rungs actually slows things down.

This PR introduces a new parameter 'nGpuMinParts', which triggers a tree walk on the CPU (instead of the GPU) whenever the number of active particles falls below the threshold.

spencerw commented 5 months ago

Note that there are still a couple of 'todo' items before this can be merged. Most importantly, enabling the periodic force calculation currently causes a hang whenever the gravity tree walk happens on the CPU. @trquinn I was hoping you might have some ideas as to why that could be happening.

spencerw commented 5 months ago

The hang during the gravity calculation was caused by a misplaced #ifdef inside of calculateEwald and is now fixed.

Some of the 'useckloop' code still needs to be incorporated and there are a couple of places where indentation needs to be updated, but a code review would definitely be helpful at this point. In particular, this PR involved creating a considerable amount of duplicated code, to ensure that the CPU tree walk can be done with or without the CUDA compile flag. Moving things into helper functions cuts down on this, but it's definitely made the code messier and harder to read.

spencerw commented 5 months ago

@trquinn Are we sure that removing 'bUseCpu' as a member from the TreePiece and ListCompute classes is what we want to do? I count at least 15 function headers that will need to be changed if we instead pass it down as an argument from Main::startGravity to all of the places it's needed.

trquinn commented 5 months ago

I think it's fine to keep the bUseCpu as Treepiece and ListCompute attributes. I recommend NOT putting them under an #ifdef CUDA

spencerw commented 4 months ago

All of the requested changes have been made. Note that the review was started before my previous commit (5f7275e), so some of the change requests are marked as 'outdated'.

It sounds like you noticed the new commit (implementing the ifdef trick to get rid of duplicated code) partway through, however.