-
### Problem Description
Pytorch fails to compile locally with aotriton, and throws the following error:
```
make -j 6 -f Makefile.shim HIPCC=hipcc AR=/usr/bin/ar EXTRA_COMPILER_OPTIONS=-I/opt/rocm…
-
Structured Kernels currently only support CPU/CUDA. Currently, this means we'll see entries like this in native_functions.yaml: An op is marked as "structured_delegate", but still has dispatch entries…
-
### 🚀 The feature, motivation and pitch
Enable support for Flash Attention Memory Efficient and SDPA kernels for AMD GPUs.
At present using these gives below warning with latest nightlies (torch=…
-
Hello,
I have written `MTTKRP`, `TTM`, `SpMV`, and Tensor Hadamard Product `THP` kernels using the TACO library. In different versions of my code, I have used different data layouts for input and o…
-
### Feature request
PagedAttention has been a mainstream optimization technology for generation task based on LLMs. It has been supported by a lot of server engines, e.g., [vllm](https://github.co…
-
The architecture family of a cpu would be helpful to have in grains. This would make it more simple when installing packages that are architecture family related.
My previous workaround was looking…
-
Is WebGPU support on the roadmap as an alternative GPU-accelerated backend? This would be especially useful for inference on the web or for non-CUDA environments.
-
HTTP 3 has several compelling advantages over HTTP 1.1 and HTTP 2:
- HTTP3 runs over QUIC, which natively supports roaming. This is especially useful for mobile clients.
- HTTP3 supports native m…
-
Hi, how to cast a float/bfloat16 tensor to fp8? I want to conduct W8A8 (fp8) quantization. But I didn't find an example of quantizing act to FP8 format.
-
Testing with the Sycl backend on Intel Ponte Vecchio on the new Blake showed a couple failing sub-tests (failure output listed below the failing executable), depending on which environment variables s…