ROCm / clr

MIT License
85 stars 35 forks source link

hipMemcpy2D fails with invalid arg if `hipMemcpyDefault` used #40

Closed hdelan closed 6 months ago

hdelan commented 6 months ago

This simple repro fails with ROCm 5.7

#include <hip/hip_runtime.h>

constexpr int width = 64;
constexpr int height = 64;
constexpr int src_sz = width * height;
constexpr int dst_offset = src_sz; // This larger offset fails
// constexpr int dst_offset = 1; // This small offset passes
constexpr int dst_sz = src_sz + dst_offset;

int main() {
  float *src, *dst;

  // Allocate memory on the device
  hipMalloc(&src, src_sz * sizeof(float));
  hipMallocManaged(&dst, dst_sz * sizeof(float));

  size_t pitch = width * sizeof(float); // no padding

  auto err = hipMemcpy2D(dst + dst_offset, pitch, src, pitch,
                         width * sizeof(float), height, hipMemcpyDefault);
  if (err != hipSuccess)
    printf("hipMemcpy2DAsync failed: %s\n", hipGetErrorString(err));

  hipFree(src);
  hipFree(dst);
}

Things that fix the invalid argument return code:

  1. Making the offset smaller
  2. Changing policy from hipMemcpyDefault to hipMemcpyDeviceToHost
  3. Changing hipMallocManaged to hipMalloc
$ hipcc --version
HIP version: 5.7.31921-d1770ee1b
AMD clang version 17.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.7.0 23352 d1e13c532a947d0cbfc94759c00dcf152294aa13)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-5.7.0/llvm/bin
hdelan commented 6 months ago

It seems that this bug was introduced in rocm 5.6.0 as rocm 5.5.1 works.

iassiour commented 6 months ago

Thank you for reporting @hdelan . I can confirm this is a bug. Creating a PR internally to fix it. Will update this issue as soon as the fix appears in clr repo here.

hdelan commented 6 months ago

Thanks @iassiour . Let me know if there is any progress on this!

iassiour commented 6 months ago

Hi @hdelan. The fix has been merged to develop branch. https://github.com/ROCm/clr/commit/d3bfb55d7a934355257a72fab538a0a634b43cad Please let me know if this issue can be closed.

hdelan commented 6 months ago

Thanks @iassiour ! Yes this ticket can be closed. Can I just ask, which versions of ROCm will have this fix for HIP runtime?

iassiour commented 6 months ago

Hi @hdelan I do not know the specific version but the fix has already been merged into the develop branch so it should appear in one of the next releases.