Closed alwinm closed 7 months ago
This should be fairly easy, if time consuming, to do so I think it will be a good first project. There's an already built struct (AutomaticLaunchParams
in cuda_utilities.h
) that handles all the setup and then just follow an example of how to use it such as in Calc_dt_GPU
or checkMagneticDivergence
.
This is something where C++20 will come in handy in the future. With C++17 you can't use structured binding to set a static
variable and so I've resorted to implementing this as a struct. IMO it would be a lot simpler and less verbose with structured binding
I'm attaching this here as well.
https://www.olcf.ornl.gov/wp-content/uploads/Intro_Register_pressure_ORNL_20220812_2083.pdf
Bob, let me know if AMD occupancy and reducing register pressure is outside the scope of what you intended here. I'm broadly interested in a larger-scale investigation but that may fall outside the scope of the task you outlined.
IMO it definitely puts it out of the range of a good first issue since that likely involves some considerable rewriting/rethinking of the algorithms and implementations of kernels. I would argue that they are two separate, but related, issues.
Closed in PR #359
I converted this Project card by @bcaddy into an issue because I am also interested in whether we can make gains in occupancy.