-
# Summary
Introduce Level-Zero Core or Tools API that enables setting-up timestamp enabled event (additional to already set) for GPU task that being submitted into the command list.
# Details …
-
I have some code that launches multiple kernels and distributes them on multiple queues which are for different CUDA devices. When only 1 gpu is used, we get the following dependency graph:
![dep_gra…
-
### 🚀 The feature, motivation and pitch
The official implementation of flash attention is in CUDA, so in AMD GPUs, users cannot easily use flash attention on transformers to training LLM. With the …
-
**Describe the proposal**
We should automatically add `-mllvm -amdgpu-function-calls=true` to the compiler flags when `-DAMREX_GPU_BACKEND=HIP` (AMD GPUs). This works around compiler bugs for large G…
-
Tested with driver: 14616
$ rbuild package -d deps -DGPU_TARGETS=gfx1201
...
$ make check -j$(nproc)
[ 1%] Built target embed_lib_migraphx_kernels
349/380 Test #378: test_py_3.…
-
### Problem Description
Looks like another problem with `amdgpu-dkms` and new kernels. I was running ROCm 6.2 just fine with kernel 6.8.0-41 - Ubuntu just pushed kernel 6.8.0-44, however, `amdgpu-dkm…
-
Hello! Having studied the documentation provided, I still could not understand whether there is support for GGUF quantized models on AMD GPU. I would like to use the Q8 or even Q4 model based on Mistr…
-
### System Info
### Full log:
Installing text-embeddings-router v1.5.0 (/data_train/search/InternData/jiejuntan/python/text-embeddings-inference/router)
Updating git repository `https://githu…
-
Do we have symbol debuging for gpu kernels now?
I am using ROCgdb shipped with rocm-4.5.2. When checking local variables or args in rocgdb, it always shows "Optimized".
Wondering it was optimized …
-
Pas de GPU class aan zodat er meerdere kernels geladen kunnen worden en de argumenten voor de meerdere kernels ook goed gezet worden.