Optimize the GPU code of gint_rho using the CUDA math library

Details

The current gint_rho algorithm uses atomicAdd at the bottom level, which is inefficient. Can we use CUDA math library functions to replace atomicAdd and speed up the gint_rho program? 微信截图_20240103213540

Task list for Issue attackers (only for developers)

[ ] Reproduce the performance issue on a similar system or environment.
[ ] Identify the specific section of the code causing the performance issue.
[ ] Investigate the issue and determine the root cause.
[ ] Research best practices and potential solutions for the identified performance issue.
[ ] Implement the chosen solution to address the performance issue.
[ ] Test the implemented solution to ensure it improves performance without introducing new issues.
[ ] Optimize the solution if necessary, considering trade-offs between performance and other factors (e.g., code complexity, readability, maintainability).
[ ] Review and incorporate any relevant feedback from users or developers.
[ ] Merge the improved solution into the main codebase and notify the issue reporter.

abacusmodeling / abacus-develop

Optimize the GPU code of gint_rho using the CUDA math library #295

Details

Task list for Issue attackers (only for developers)