native-kernels Search Results

1000+ results
for native-kernels

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ROCm/aotriton #18

[Issue]: Pytorch fails to compile locally due to aotriton fa…

### Problem Description Pytorch fails to compile locally with aotriton, and throws the following error: ``` make -j 6 -f Makefile.shim HIPCC=hipcc AR=/usr/bin/ar EXTRA_COMPILER_OPTIONS=-I/opt/rocm…

Zakhrov updated 2 months ago
4
pytorch/pytorch #52680

Add Quantized{CPU|CUDA} support to Structured Kernels

Structured Kernels currently only support CPU/CUDA. Currently, this means we'll see entries like this in native_functions.yaml: An op is marked as "structured_delegate", but still has dispatch entries…

bdhirsh updated 3 years ago
2
pytorch/pytorch #112997

Add support for Flash Attention for AMD/ROCm

### 🚀 The feature, motivation and pitch Enable support for Flash Attention Memory Efficient and SDPA kernels for AMD GPUs. At present using these gives below warning with latest nightlies (torch=…

chauhang updated 2 months ago
6
tensor-compiler/taco #559

TACO does not run parallel

Hello, I have written `MTTKRP`, `TTM`, `SpMV`, and Tensor Hadamard Product `THP` kernels using the TACO library. In different versions of my code, I have used different data layouts for input and o…

mtghorbani updated 4 months ago
1
pytorch/pytorch #121465

[RFC] PagedAttention Support

### Feature request PagedAttention has been a mainstream optimization technology for generation task based on LLMs. It has been supported by a lot of server engines, e.g., [vllm](https://github.co…

liangan1 updated 2 months ago
17
saltstack/salt #58310

grain cpu_family

The architecture family of a cpu would be helpful to have in grains. This would make it more simple when installing packages that are architecture family related. My previous workaround was looking…

steverweber updated 6 days ago
2
huggingface/candle #344

WebGPU support

Is WebGPU support on the roadmap as an alternative GPU-accelerated backend? This would be especially useful for inference on the web or for non-CUDA environments.

sluijs updated 1 month ago
12
reyk/httpd #84

HTTP/3 (HTTP over QUIC) support

HTTP 3 has several compelling advantages over HTTP 1.1 and HTTP 2: - HTTP3 runs over QUIC, which natively supports roaming. This is especially useful for mobile clients. - HTTP3 supports native m…

DemiMarie updated 3 years ago
1
NVIDIA/TransformerEngine #965

How to cast 16/32-bit to FP8?

Hi, how to cast a float/bfloat16 tensor to fp8? I want to conduct W8A8 (fp8) quantization. But I didn't find an example of quantizing act to FP8 format.

mxjmtxrm updated 1 month ago
3
kokkos/kokkos-kernels #1961

Nightly Sycl unit test failures with intel/2023.1.0, intel/2…

Testing with the Sycl backend on Intel Ponte Vecchio on the new Blake showed a couple failing sub-tests (failure output listed below the failing executable), depending on which environment variables s…

ndellingwood updated 3 weeks ago
7

上一页 1...3 4 5 6 7 8 9...100 下一页

1000+ results for native-kernels

1000+ results
for native-kernels