native-kernels Search Results

1000+ results
for native-kernels

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/pytorch #126980

Long build compilation time (>5 hours) for CUDA ARM build

### 🐛 Describe the bug Issue summary: As part of process to add CUDA ARM nightly wheel, we are seeing long build compilation time. Needs ~5hrs to compile https://github.com/pytorch/pytorch/actio…

tinglvv updated 1 month ago
1
pytorch/pytorch #115985

Faster Pytorch dequantize() + matmul for quantized models

### Fast Pytorch dequantize() + matmul I would like to open the discussion about faster inference with quantized models using pure Pytorch calls. As you know, quantization is extremely importan…

mobicham updated 2 months ago
5
LuxDL/Lux.jl #605

Meta Issue for proper Enzyme Integration into Lux

- CPU Support: We have tests covering several models. - [x] PolyAlg selecting the correct broadcast mechanism for `fast_activation!!` fails https://github.com/EnzymeAD/Enzyme.jl/issues/1408 (Fixed …

avik-pal updated 1 week ago
3
pytorch/pytorch #33166

TensorIterator does not work with different input/output typ…

## 🐛 Bug TensorIterator expects all the inputs and outputs to have the same type. This prevents us from using TensorIterator for operations like quantized batchnorm, where the input is quantized (q…

supriyar updated 4 years ago
2
kokkos/kokkos-kernels #2033

Possible bug in "native-merge" SpMV algorithm

https://github.com/kokkos/kokkos-kernels/issues/2010 was caused by the merge-based SpMV algorithm being selected in Tpetra for this specific mini-em case. The merge-based SpMV was incorrectly re-using…

cwpearson updated 2 months ago
1
InternLM/lmdeploy #1870

[Bug] 不支持qwen0.5b的加速？以及qwen0.5b的awq量化？

### Checklist - [ ] 1. I have searched related issues but cannot get the expected help. - [ ] 2. The bug has not been fixed in the latest version. ### Describe the bug 是否不支持qwen0.5b的加速？以及qwen0.5b的a…

qism updated 1 month ago
5
unslothai/unsloth #131

Kernel tuning and benchmarking

Hey! Just opening an issue because there doesn't seem to be a discussion board. I noticed there's no tuning around all of the triton kernels for things like block size and not much coverage around…

cm2435 updated 6 months ago
3
pytorch/executorch #4510

How to link custom ops?

Hi! I'm trying to integrate some of quantized MatMul C++ kernels into Executorch and I'm having a bad time: the documentation is very vague about what exactly I need to include/link for ATen to pic…

BlackSamorez updated 3 days ago
1
flashinfer-ai/flashinfer #160

Support for Volta / Turing architectures

I saw that support for sm75 / sm70 is listed in progress (https://docs.flashinfer.ai/installation.html) but didn't see an issue to track. Is this something planned in the near-term or further out on t…

tgaddair updated 1 week ago
6
pytorch/pytorch #126719

[RFC] Enable PyTorch XPU on Native Windows on Intel GPUs

### 🚀 The feature, motivation and pitch This RFC proposes to enable PyTorch XPU on Native Windows on Intel GPUs, following [[RFC] Intel GPU Upstreaming #114723](https://github.com/pytorch/pytorch/i…

min-jean-cho updated 1 month ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for native-kernels

1000+ results
for native-kernels