-
https://qifeng.xyz/posts/ed5d8a1b.html
本文主要介绍通过WMMA API 使用TensorCore的方法。
-
Hi, I hope that this repo is still maintained or at least open for questions. :)
My use-case:
I have a code-base which utilizes the C++ wmma template API. For specific reasons I need to perform…
-
I encountered a strange bug while programming tensor core using the **WMMA** api in A800.
I tried to print the size of the element in the fragment,Normally **sizeof**(fp16) is 2, the following code a…
-
### Issue Description
When starting up the current build, sdnext tries to fetch, build and install https://github.com/ROCm/flash-attention@howiejay/navi_support. However, compilation fails. Nominal…
-
Hi @tridao, we recently implemented INT8 forward FMHA (8-bit Flash-Attention) with both static and dynamic quantization for Softmax on our GPGPU card, and achieved good results and relatively okay acc…
-
Installed latest version of AMD drivers. Graphics card is: 7900 XTX
```
No vmfb found. Compiling and saving to D:\nodeai shark\euler_scale_model_input_1_512_512_rocm_fp16.vmfb
Configuring for dev…
-
As discussed in discord having a basic Tensor library that would make basic acceleration for 1d arrays, 2d matrices and 3d+ tensors a lot easier to use. As of right now, it is very tedious as you have…
-
From [NVIDIA newsletter today](http://info.nvidia.com/index.php/email/emailWebview?mkt_tok=eyJpIjoiTnpCbVlUQmpNbUkzT1RGaSIsInQiOiJybjdNWDV4Vk0rMlNwUEJaVnY5U1hLWDBmR3hwRjBUV0t6djBYdXMzcmYvbnpYbkcvNVlaQ…
-
Congrats on your release!
I am wondering if your implementation allows me to use systolic arrays (tensor cores or xmx engines or matrix cores in diff gpu implementations)? Intel's implementation ha…
-
This issue is meant to be an overview / track progress on supporting Float16 in CUDA.jl
## Library APIs
We'll be able to do a lot already if we just dispatch to the correct library calls, e.g., …