AMReX-Codes / amrex

AMReX: Software Framework for Block Structured AMR
https://amrex-codes.github.io/amrex
Other
536 stars 343 forks source link

Tune PermutationForDeposition for MI250X #3925

Closed AlexanderSinn closed 4 months ago

AlexanderSinn commented 4 months ago

Summary

PermutationForDeposition was initially developed for A100. A few tweaks can be made to improve performance on MI250X, which has a smaller cache but is much less sensitive to atomic add congestion.

Additional background

Test with MI250X image

I also did the same test with A100, where I forced it to use the AMD tune. image

Checklist

The proposed changes: