cholla-hydro / cholla

A GPU-based hydro code
https://github.com/cholla-hydro/cholla/wiki
MIT License
65 stars 32 forks source link

Revert minor changes so 1-D examples can run and add debug functions #315

Closed alwinm closed 1 year ago

alwinm commented 1 year ago

This PR contains a number of reverts as a hotfix to get 1-D examples (#313) working with minimal effort but will benefit from additional work and thought down the line. Most pressing would be bringing ppmc and ppmc tests back into sync--the thread guards in this PR allow non-zero values in ghost cells but the test expects zero values in ghost cells.

Old note:

Just a draft for now. Related to issue #313.

After some feverish hacking around to find the bug, I think this revert might be promising but it also might not be. I'll take another look later.

There is probably a clean way to refactor this for compatibility with 1-D 2-D and 3-D without a full revert.

bcaddy commented 1 year ago

Is the issue that the thread guard in PPMC_CTU is too restrictive? If so I'm curious about why it works at all in 3D since that's basically just the 1D case 3 times.

alwinm commented 1 year ago

Is the issue that the thread guard in PPMC_CTU is too restrictive? If so I'm curious about why it works at all in 3D since that's basically just the 1D case 3 times.

I think so... in the 3-D case your version of the thread guard correctly isolates the Real cells. In the old version, for simplicity between 1-D, 2-D, and 3-D, it does some work on ghost cells (which is extra work in the 2-D and 3-D cases). I'm guessing this extra work is negligible, simply adding a number of threads, and avoids some if-statements for each thread.

Basically, for 1-D, the "real cells" do go from 0 to nx in the perpendicular directions, not 3 to nx-3.

bcaddy commented 1 year ago

Got it, that makes sense. I'm with you that I think there's a better way to compute the limits but I'm at a conference at the moment and don't have time to think about it, I'll consider it more when I get back. We might want to add the dimensionality as a template argument, per issue #308 making the arguments that are known at compile time into template arguments saves a surprising amount of time.

Also, the time saved in 3D isn't negligible. At 256^3 cells the ghost cells make up about 9% of the entire grid on the GPU (PPM with hydro) and the more aggressive thread guard saves about 6% of the runtime per kernel launch.

bcaddy commented 1 year ago

Per the offline discussion. I fixed the issues I noted. I also fixed this issue in other reconstruction+integrator combinations. To avoid code duplication I wrote a function for the header guard and added a test for it.

evaneschneider commented 1 year ago

Thank you for making those changes. I find the new thread guard function quite hard to read, could you add some more comments explaining to the user what is going on there? I.e. explain what "order" is, and what checks are actually happening?

bcaddy commented 1 year ago

Done.

I also added truly 1D and 2D tests so that hopefully this doesn't happen again.

evaneschneider commented 1 year ago

Good call!