-
Compiling Grid with OpenMP target offload to AMD GPUs, throws errors:
```
error: stack frame size (149840) exceeds limit (131056) in function '__omp_offloading_72_1e118ab9__ZN4Grid7LatticeINS_7iSc…
-
### Your current environment
```text
PyTorch version: 2.2.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC ve…
-
Initial compile attempt failed with this issue here: https://github.com/ROCm/rocm_smi_lib/issues/170
After applying the following patch, this fixed the initial issue above:
```
--- a/include/r…
-
The upstream repo is now supporting RX 7000 series, but there are failed tests:
```
FAILED python/test/unit/language/test_core_amd.py::test_reduce1d[min-int16-128]
FAILED python/test/unit/languag…
-
This is to continue the discussion that started in #3934. On AMD GPUs, the OpenCL platform is sometimes several times slower than the HIP platform. We're trying to figure out why. Much of the slown…
-
It would be nice to eventually have OpenCL support for those of us with GPUs that don't do CUDA.
-
/opt/rocm/include/hip/amd_detail/amd_device_functions.h:995 declares`clock()` to have return type `long long`. ctime.h declares it to have return type `clock_t`, which is `long`. Please fix.
-
### 🐛 Describe the bug
Hello, this is a follow-up issue of the previous https://github.com/pytorch/pytorch/issues/120478. The original issue was fixed in PR https://github.com/pytorch/pytorch/pull/12…
-
I am trying to use this script to compile a kernel on linux and load the binary in my windows program with clCreateProgramWithBinary. Will this work?
I already get the ROCm compiler, and this clang…
-
Hi all,
I am working on a kernel which hits an assertion in `RemoveLayoutConversions` pass during the IR rewrite (the latest `main` branch). The bug is common for both `cuda` and `hip` backends.
…