native-kernels Search Results

1000+ results
for native-kernels

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

HazyResearch/ThunderKittens #23

[Feature Request] GEMM benchmarks and FP8 Support

I really like the simplicity of TK and think it could be broadly applicable to kernel authoring beyond attention. Has there been any benchmarking done of pure GEMM operations? If so, an example would …

jwfromm updated 2 months ago
7
pytorch/pytorch #124018

Fix OOMs during parallel builds

### 🐛 Describe the bug Compilation of the flash attention CUDA kernels takes a lot of RAM. For example, on my machine, compiling `aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_h…

RuRo updated 3 weeks ago
3
chapel-lang/chapel #25665

[Bug]: multiple kernels with a 2D domain on remote variables…

### Summary of Problem The following code produces the error "gpu-nvidia.c:292: Error calling CUDA function: an illegal memory access was encountered". ```chapel const D = {0..

jabraham17 updated 4 days ago
4
ARM-software/ComputeLibrary #1084

sparse gemm kernels are not supported in ACL

**Output of 'strings libarm_compute.so | grep arm_compute_version':** arm_compute_version=v23.11 Build options: {'Werror': '0', 'debug': '0', 'neon': '1', 'opencl': '0', 'embed_kernels': '0', 'os…

snadampal updated 4 months ago
5
NVIDIA/TensorRT-LLM #1580

Fail to build int4_awq on Mixtral 8x7b

### System Info ubuntu 20.04 tensorrt 10.0.1 tensorrt-cu12 10.0.1 tensorrt-cu12-bindings 10.0.1 tensorrt-cu12-libs 10.0.1 tensorrt-llm 0.10.…

gloritygithub11 updated 3 weeks ago
14
trilinos/Trilinos #10217

Ifpack2: native RILUK implementation is not UVM-free

## Enhancement @trilinos/ifpack2 @csiefer2 @srajama1 @vqd8a Ifpack2's native implementation of RILUK depends on UVM if compiled for CUDA. Some of the associated unit tests have been modified to …

jhux2 updated 1 year ago
2
pytorch/ao #47

[RFC] Plans for torchao

### Summary Last year, we released [pytorch-labs/torchao](https://github.com/pytorch-labs/ao) to provide acceleration of Generative AI models using native PyTorch techniques. Torchao added support …

supriyar updated 3 months ago
21
clearlinux/clr-boot-manager #137

Cannot set the iot-lts2018 kernel as the default kernel

Steps to reproduce this issue: 1. Install Clear Linux (I'm currently on 27320) 2. `sudo swupd bundle-add kernel-iot-lts2018` 3. Check kernels available (note: my system was recently updated so I ha…

gvancuts updated 4 years ago
5
flashinfer-ai/flashinfer #19

[Roadmap] FlashInfer v0.1.0 release checklist

Expected release date: Mar 15th, 2024 # General 1. [x] Support general page table layout (@yzh119 ) 2. [ ] sm70/75 compatibility (@yzh119 ) 3. [ ] performance: using fp16 as intermediate data ty…

yzh119 updated 4 days ago
5
pytorch/pytorch #120423

Inefficient triton kernels generated by inductor

### 🐛 Describe the bug I gathered all the 10K triton kernels generated by inductor using stack of PR ( https://github.com/pytorch/pytorch/pull/120048 ). After deduping same kernels used by different …

shunting314 updated 5 months ago
6

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for native-kernels

1000+ results
for native-kernels