simt Search Results - Githubissues

495 results
for simt

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/cutlass #1827

[BUG] Some CUTLASS headers are not self-contained

**Describe the bug** Some files are missing the headers that they rely on, which means they cannot be included by themselves. This is "hidden" in most of the examples because they import many things a…

saagarjha updated 2 weeks ago
3
iree-org/iree #16127

[Vector Distribution] Layout Choices

This issue tracks the resources + discussion for deciding how the layout should look like for Vector Distribution.

Groverkss updated 1 week ago
12
halide/Halide #8249

vulkan backend is broken

``` abadams@anadams-work:~/projects/Halide_main/apps/local_laplacian $ HL_TARGET=host-vulkan make test bin/host/local_laplacian.generator -g local_laplacian -e static_library,h,registration,stmt,as…

abadams updated 5 months ago
1
NVIDIA/cutlass #1858

[QST] Performance Issue of doing GEMM on A100 using CuTe

Hi, I've just created a small project ([link to the project](https://github.com/Yanksi/cute_mma)) by modifying the `sgemm_sm80` example. What I was doing was trying to make use of the tensor cores for…

Yanksi updated 1 week ago
16
NVIDIA/cutlass #1402

[BUG] illegal memory access for depthwise convolution with i…

**Describe the bug** In the example for working with depthwise convolution, the half type is used as the data type and accumulator, and for our task we are trying to reuse the kernel for the int8 t…

lxq2t updated 4 months ago
5
markhuyong/git-favorites #8

…

https://mp.weixin.qq.com/s?src=11&timestamp=1641890990&ver=3551&signature=AEp*bKffbgAg02GkLMjiswOq6Ngkvr4NaTivylLKgRywSGHXp3Nz-jzsV4D0q2OiBtrBw4P0iY0emgeqacNn2TVHwlG1WvgpT8x2d0VlEg-tjMQIC7oQj4zozb60eo…

markhuyong updated 2 years ago
1
spack/spack #47151

Installation issue: spiral-software

### Steps to reproduce the issue ```console $ spack spec -I spiral-software Input spec -------------------------------- - spiral-software Concretized -------------------------------- - …

wspear updated 3 weeks ago
1
yuenshome/yuenshome.github.io #43

ARM

一小时教你学会 ARM 架构 - GitChat技术杂谈 - CSDN博客 https://blog.csdn.net/GitChat/article/details/78410083 基于ARM在cpu上做神经网络加速 https://blog.csdn.net/deng497/article/details/69258081 嵌入式平台做深度学习算法，不可不重视的4件事 ht…

ysh329 updated 5 years ago
2
intel/intel-xpu-backend-for-triton #781

[DPAS Layout] There is an issue in broadcast the vector to m…

There is a case in tt_dot uses the broadcast to make a matrix from a vector. The IR is like this: ``` #mma = #triton_intel_gpu.dpas %36 = tt.broadcast %35 : tensor -> tensor loc(#loc19) ``` Th…

chengjunlu updated 7 months ago
1
NVIDIA/cutlass #1567

[QST] CUTLASS kernels appear to be significantly slower than…

**What is your question?** I want to compare the performance of CUTLASS kernels to `cublasHgemm`, which gives me ~50,000 GFLOP/s on a T4 card, with m,n,k = 4096,4096, 4096. I have tried passing va…

alexarmbr updated 5 months ago
4

上一页 1...14 15 16 17 18 19 20...50 下一页

495 results for simt

495 results
for simt