-
**Describe the bug**
Some files are missing the headers that they rely on, which means they cannot be included by themselves. This is "hidden" in most of the examples because they import many things a…
-
This issue tracks the resources + discussion for deciding how the layout should look like for Vector Distribution.
-
```
abadams@anadams-work:~/projects/Halide_main/apps/local_laplacian
$ HL_TARGET=host-vulkan make test
bin/host/local_laplacian.generator -g local_laplacian -e static_library,h,registration,stmt,as…
-
Hi, I've just created a small project ([link to the project](https://github.com/Yanksi/cute_mma)) by modifying the `sgemm_sm80` example. What I was doing was trying to make use of the tensor cores for…
-
**Describe the bug**
In the example for working with depthwise convolution, the half type is used as the data type and accumulator, and for our task we are trying to reuse the kernel for the int8 t…
lxq2t updated
4 months ago
-
https://mp.weixin.qq.com/s?src=11×tamp=1641890990&ver=3551&signature=AEp*bKffbgAg02GkLMjiswOq6Ngkvr4NaTivylLKgRywSGHXp3Nz-jzsV4D0q2OiBtrBw4P0iY0emgeqacNn2TVHwlG1WvgpT8x2d0VlEg-tjMQIC7oQj4zozb60eo…
-
### Steps to reproduce the issue
```console
$ spack spec -I spiral-software
Input spec
--------------------------------
- spiral-software
Concretized
--------------------------------
- …
-
一小时教你学会 ARM 架构 - GitChat技术杂谈 - CSDN博客
https://blog.csdn.net/GitChat/article/details/78410083
基于ARM在cpu上做神经网络加速
https://blog.csdn.net/deng497/article/details/69258081
嵌入式平台做深度学习算法,不可不重视的4件事
ht…
-
There is a case in tt_dot uses the broadcast to make a matrix from a vector.
The IR is like this:
```
#mma = #triton_intel_gpu.dpas
%36 = tt.broadcast %35 : tensor -> tensor loc(#loc19)
```
Th…
-
**What is your question?**
I want to compare the performance of CUTLASS kernels to `cublasHgemm`, which gives me ~50,000 GFLOP/s on a T4 card, with m,n,k = 4096,4096, 4096.
I have tried passing va…