-
---
## Feature request
I would like a nopython `@cfunc` to be able to launch a `@cuda.jit` kernel.
```python3
import numba.types as nt
@numba.cuda.jit(nt.void(nt.CPointer(nt.float32), nt…
-
When running the pixart sigma example on CUDA arch >= 80 with `int4` weights, the following error happens:
```shell
File "/home/ubuntu/dev/quanto/optimum/quanto/tensor/qtensor_func.py", line 152…
-
A user reported being unable to build MPICH using `clang++` as the CUDA compiler used to build Yaksa pmodels/mpich#6954. Noting down the issues encountered trying to build like this:
1. `NVCC_FLAGS` …
-
### 🚀 The feature, motivation and pitch
The official implementation of flash attention is in CUDA, so in AMD GPUs, users cannot easily use flash attention on transformers to training LLM. With the …
-
I'm trying to use Flash Attention on an environment with CUDA 12.1 but it fails to compile. Is it expected?
Reproducing:
1. Start a Docker container with CUDA version 12.1.1 `docker run -it --gp…
-
I am working with the following:
Ubuntu 22.04.4 LTS
Nvidia driver: 555.42.06
Nvidia RTX 3080 Ti
I have multiple cuda installations in the /usr/local folder but I have added the cuda-12.1 bin t…
-
### 🐛 Describe the bug
After #134373 I started getting the error "RuntimeError: CUDA error: operation not supported" when trying to run pytorch.
Fresh build from source succeeds before #134373 and f…
-
```shell
${PYTHON_BIN_PATH} configure.py --backend CUDA \
--os LINUX \
--host_compiler GCC \
--cuda_…
-
### System Info
As in the title.
### Information
- [ ] Docker
- [X] The CLI directly
### Tasks
- [ ] An officially supported command
- [ ] My own modifications
### Reproduction
I tried to ins…
-
Realm provides [a profiling response for measuring the upper and lower bound on when kernels launched by a GPU task are executed](https://gitlab.com/StanfordLegion/legion/-/blob/master/runtime/realm/p…