ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.
MIT License
218 stars 147 forks source link

Use hipMemcpyAsync for validation #1891

Closed nakajee closed 7 months ago

nakajee commented 8 months ago
nakajee commented 8 months ago

This is a trial fix for random mismatch with device result 0. I will run CI test multiple times with this change.

nakajee commented 8 months ago

I will fix the build error...

nakajee commented 8 months ago

Reference rocblas commit for hipMemcpyAsync: https://github.com/ROCm/rocBLAS/commit/c070df9bdc31cce30e0d4d732ad8a60f3b2ee332

nakajee commented 7 months ago

I ran precheckin test 11 times and extended test 7 times. I did not see mismatch issue with device result=0 so far. I will merge this change.