wmma-api Search Results

69 results
for wmma-api

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

yinxiangkai/yinxiangkai.github.io #43

TensorCore WMMA 编程 | 豈風

https://qifeng.xyz/posts/ed5d8a1b.html 本文主要介绍通过WMMA API 使用TensorCore的方法。

yinxiangkai updated 1 month ago
3
wmmae/wmma_extension #4

Unexpected Performance Regression

Hi, I hope that this repo is still maintained or at least open for questions. :) My use-case: I have a code-base which utilizes the C++ wmma template API. For specific reasons I need to perform…

elvircrn updated 10 months ago
2
NVIDIA/cuda-samples #251

Bug in tensor core programming

I encountered a strange bug while programming tensor core using the **WMMA** api in A800. I tried to print the size of the element in the fragment，Normally **sizeof**(fp16) is 2, the following code a…

blueWatermelonFri updated 5 months ago
1
vladmandic/automatic #3515

[Issue]: compilation of flash-attention@howiejay/navi_suppor…

### Issue Description When starting up the current build, sdnext tries to fetch, build and install https://github.com/ROCm/flash-attention@howiejay/navi_support. However, compilation fails. Nominal…

kingoftanoa updated 1 week ago
3
Dao-AILab/flash-attention #122

INT8 versions of FMHA and Flash-Attention (Forward)

Hi @tridao, we recently implemented INT8 forward FMHA (8-bit Flash-Attention) with both static and dynamic quantization for Softmax on our GPGPU card, and achieved good results and relatively okay acc…

jundaf2 updated 9 months ago
7
nod-ai/SHARK-Studio #2119

AMD Rocm windows does not work - hipErrorSharedObjectInitFai…

Installed latest version of AMD drivers. Graphics card is: 7900 XTX ``` No vmfb found. Compiling and saving to D:\nodeai shark\euler_scale_model_input_1_512_512_rocm_fp16.vmfb Configuring for dev…

vasicvuk updated 2 months ago
1
m4rs-mt/ILGPU #1107

A Tensor Library

As discussed in discord having a basic Tensor library that would make basic acceleration for 1d arrays, 2d matrices and 3d+ tensors a lot easier to use. As of right now, it is very tedious as you have…

12345swordy updated 5 months ago
7
leela-zero/leela-zero #2007

NVIDIA new deep learning software releases

From [NVIDIA newsletter today](http://info.nvidia.com/index.php/email/emailWebview?mkt_tok=eyJpIjoiTnpCbVlUQmpNbUkzT1RGaSIsInQiOiJybjdNWDV4Vk0rMlNwUEJaVnY5U1hLWDBmR3hwRjBUV0t6djBYdXMzcmYvbnpYbkcvNVlaQ…

alreadydone updated 5 years ago
12
AdaptiveCpp/AdaptiveCpp #1404

Can we use systolic arrays in this implementation?

Congrats on your release! I am wondering if your implementation allows me to use systolic arrays (tensor cores or xmx engines or matrix cores in diff gpu implementations)? Intel's implementation ha…

chsasank updated 7 months ago
9
JuliaGPU/CUDA.jl #391

Tracker: Float16 support

This issue is meant to be an overview / track progress on supporting Float16 in CUDA.jl ## Library APIs We'll be able to do a lot already if we just dispatch to the correct library calls, e.g., …

maleadt updated 1 year ago
6

上一页 1...1 2 3 4 5 6 7...7 下一页

69 results for wmma-api

69 results
for wmma-api