smem Search Results - Githubissues

1000+ results
for smem

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/cutlass #1927

[QST] Question Regarding To The Use Of `Swizzle`

When I was running the (code example)[https://github.com/user-attachments/files/17388059/sgemm_sm80_tmp.txt] provided by @ccecka in another [issue](https://github.com/NVIDIA/cutlass/issues/1858), I go…

Yanksi updated 2 hours ago
1
NVIDIA/cutlass #1842

[QST] SmemCopyAtom and MMA_Atom for fp32?

**What is your question?** hello, I am developing a full precision attention backward kernel using cutlass, and get stuck in the use of ldmatrix and mma instructions for fp32. My Gemm calculation is …

vickyandpiggy updated 4 weeks ago
6
fixstars/libSGM #59

Census transformation

Hello, I have a question regarding census transformation when reading your code. Why aren't you comparing the intensity of neighboring pixels with the center? Best regards. Here is the code i…

Apeiria updated 4 years ago
2
NVIDIA/cutlass #1873

[QST] Don't konw how to use predicate tensor.

I encountered some problems when using predicate tensor. In the tutorials: https://github.com/NVIDIA/cutlass/blob/main/examples/cute/tutorial/tiled_copy.cu https://github.com/NVIDIA/cutlass/blob/mai…

ZhangZhiPku updated 3 weeks ago
2
NVIDIA/cutlass #1556

[QST/BUG] why cute kernel transfers so much data between L2 …

**What is your question?** I am learning to use cute to build a hgemm kernel. Tested on A10 GPU, the cute kernel is good with small problem size such as m/n/k = 4096, but I found it's much slower …

irasin updated 2 months ago
9
jax-ml/jax #20046

Pallas Kernel using Smem/SReg failed to lower

### Description Tried to write a test with kernel for scalar using Smem/SReg: ``` def test_scalar_exp(self): def scalar_exp_kernel(in_smem_ref: Float, out_smem_ref: Float): in_sreg = …

tankbattle updated 8 months ago
2
NVIDIA/cutlass #1858

[QST] Performance Issue of doing GEMM on A100 using CuTe

Hi, I've just created a small project ([link to the project](https://github.com/Yanksi/cute_mma)) by modifying the `sgemm_sm80` example. What I was doing was trying to make use of the tensor cores for…

Yanksi updated 1 day ago
16
NVIDIA/Fuser #3272

Tracks performance issues related to inner outer persistent …

(1) After inner persistent buffers are stored in shared memory. There are still bank conflicts if the persistent buffer is NOT projected to inputs due to two reasons: ``` (a) We are missing a cacheBe…

liqiangxl updated 1 week ago
1
RT-Thread/rt-thread #9405

[Bug]Memory error in rt_malloc function

### RT-Thread Version 5.2.0 commit 2f559906d6202c27142237ab4b1d893034a5b7c3 ### Hardware Type/Architectures VEXPRESS_A9 ### Develop Toolchain GCC ### Describe the bug ### Steps to reproduce: …

LecterChu updated 1 month ago
1
mirage-project/mirage #97

Extremely low running time when profiling transpiled muGraph…

When profiling transpiled muGrpahs, some results are extremely low and are close to kernel launch time. For example, in the gated_mlp example, some muGraphs only consume ~0.004ms in the catalyst clust…

wmdi updated 3 weeks ago
1

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for smem

1000+ results
for smem