cholla-hydro / cholla

A GPU-based hydro code
https://github.com/cholla-hydro/cholla/wiki
MIT License
60 stars 32 forks source link

Try out occupancy API for all kernels and see if it improves performance #232

Closed alwinm closed 7 months ago

alwinm commented 1 year ago

I converted this Project card by @bcaddy into an issue because I am also interested in whether we can make gains in occupancy.

bcaddy commented 1 year ago

This should be fairly easy, if time consuming, to do so I think it will be a good first project. There's an already built struct (AutomaticLaunchParams in cuda_utilities.h) that handles all the setup and then just follow an example of how to use it such as in Calc_dt_GPU or checkMagneticDivergence.

This is something where C++20 will come in handy in the future. With C++17 you can't use structured binding to set a static variable and so I've resorted to implementing this as a struct. IMO it would be a lot simpler and less verbose with structured binding

alwinm commented 1 year ago

I'm attaching this here as well.

https://www.olcf.ornl.gov/wp-content/uploads/Intro_Register_pressure_ORNL_20220812_2083.pdf

Bob, let me know if AMD occupancy and reducing register pressure is outside the scope of what you intended here. I'm broadly interested in a larger-scale investigation but that may fall outside the scope of the task you outlined.

bcaddy commented 1 year ago

IMO it definitely puts it out of the range of a good first issue since that likely involves some considerable rewriting/rethinking of the algorithms and implementations of kernels. I would argue that they are two separate, but related, issues.

bcaddy commented 7 months ago

Closed in PR #359