-
### 🐛 Describe the bug
Issue summary:
As part of process to add CUDA ARM nightly wheel, we are seeing long build compilation time. Needs ~5hrs to compile https://github.com/pytorch/pytorch/actio…
-
### Fast Pytorch dequantize() + matmul
I would like to open the discussion about faster inference with quantized models using pure Pytorch calls.
As you know, quantization is extremely importan…
-
- CPU Support: We have tests covering several models.
- [x] PolyAlg selecting the correct broadcast mechanism for `fast_activation!!` fails https://github.com/EnzymeAD/Enzyme.jl/issues/1408 (Fixed …
-
## 🐛 Bug
TensorIterator expects all the inputs and outputs to have the same type. This prevents us from using TensorIterator for operations like quantized batchnorm, where the input is quantized (q…
-
https://github.com/kokkos/kokkos-kernels/issues/2010 was caused by the merge-based SpMV algorithm being selected in Tpetra for this specific mini-em case. The merge-based SpMV was incorrectly re-using…
-
### Checklist
- [ ] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
### Describe the bug
是否不支持qwen0.5b的加速?以及qwen0.5b的a…
-
Hey! Just opening an issue because there doesn't seem to be a discussion board.
I noticed there's no tuning around all of the triton kernels for things like block size and not much coverage around…
-
Hi!
I'm trying to integrate some of quantized MatMul C++ kernels into Executorch and I'm having a bad time: the documentation is very vague about what exactly I need to include/link for ATen to pic…
-
I saw that support for sm75 / sm70 is listed in progress (https://docs.flashinfer.ai/installation.html) but didn't see an issue to track. Is this something planned in the near-term or further out on t…
-
### 🚀 The feature, motivation and pitch
This RFC proposes to enable PyTorch XPU on Native Windows on Intel GPUs, following [[RFC] Intel GPU Upstreaming #114723](https://github.com/pytorch/pytorch/i…