AMReX-Codes / amrex

AMReX: Software Framework for Block Structured AMR
https://amrex-codes.github.io/amrex
Other
503 stars 336 forks source link

use HIP stream-ordered allocator #3980

Closed BenWibking closed 3 weeks ago

BenWibking commented 3 weeks ago

Summary

When available, this uses the HIP stream-ordered memory allocator hipMallocAsync/hipFreeAsync to make device arena allocations. This should improve performance (performance testing TBD).

AMReX tests pass on MI210 with ROCm 6.0.

Closes https://github.com/AMReX-Codes/amrex/issues/3979.

Additional background

ROCm 5.2.0 added these APIs: https://rocm.docs.amd.com/en/latest/about/changelog.html#id665 HIP documentation: https://rocm.docs.amd.com/projects/HIP/en/latest/doxygen/html/group___stream_o.html

The CUDA stream-ordered allocator is already used by AMReX (with CUDA >= 11.2).

Checklist

The proposed changes:

BenWibking commented 3 weeks ago

Hmm, these tests fail here and also fail on the latest development commit:

92% tests passed, 3 tests failed out of 38

Total Test time (real) = 657.96 sec

The following tests FAILED:
         24 - Particles_ParallelContext_3d (Failed)
         30 - Particles_Redistribute_3d (Failed)
         31 - Particles_RedistributeSOA_3d (Failed)
Errors while running CTest

See https://github.com/AMReX-Codes/amrex/issues/3981.