The current gint_rho algorithm uses atomicAdd at the bottom level, which is inefficient. Can we use CUDA math library functions to replace atomicAdd and speed up the gint_rho program?
Task list for Issue attackers (only for developers)
[ ] Reproduce the performance issue on a similar system or environment.
[ ] Identify the specific section of the code causing the performance issue.
[ ] Investigate the issue and determine the root cause.
[ ] Research best practices and potential solutions for the identified performance issue.
[ ] Implement the chosen solution to address the performance issue.
[ ] Test the implemented solution to ensure it improves performance without introducing new issues.
[ ] Optimize the solution if necessary, considering trade-offs between performance and other factors (e.g., code complexity, readability, maintainability).
[ ] Review and incorporate any relevant feedback from users or developers.
[ ] Merge the improved solution into the main codebase and notify the issue reporter.
Details
The current gint_rho algorithm uses atomicAdd at the bottom level, which is inefficient. Can we use CUDA math library functions to replace atomicAdd and speed up the gint_rho program?
Task list for Issue attackers (only for developers)