-
- [ ] Distribution patterns for XeGPU ops
- [ ] Populate XeVM with basic ops for efficient matmul
- [ ] Sg_map propagation analysis/pass
- [ ] Distributed IR flattening
-
For [this ](https://gist.github.com/nirvedhmeshram/344f11443b96fb9ff022fa283cc6cd8a) matmul like + elementwise IR, we go down the LLVMGPUSIMT pipeline, see dump [here](https://gist.github.com/nirvedhm…
-
hi, i see that specialization defs of template class `DefaultConv2dGroupFprop` in file `cutlass/conv/kernel/default_conv2d_group_fprop.h` has no OpClassSimt tag, what can i do to support simt version …
-
Which version of LLVm is required by ROCgdb to support "DWARF Extensions for Optimized SIMT/SIMD (GPU) Debugging"?
-
is there SIMT-deadlock issue for the SIMT-stack based divergence? how to deal with it if yes?
-
**What is your question?**
Internal CUTLASS error is observed, when I try increasing the warp count for kernel "cutlass_simt_hgemm_256x128_8x2_nt_align1" to values other than default 4x2x1 (by changi…
-
The Triton XPU has switched to use the OCL interface for DPAS.
The OCL interface only supports the sub-group-size=8 with the packed i16 Dtype for A operands.
It requires a different layout in the SI…
-
## Background
The Triton kernel is generated as SIMT major SPIRV kernel. It is because some component has to be used with SIMT paradigm. Like: Intel math library is only SIMT version.
But for some m…
-
```bash
RuntimeError: /io/build/temp.linux-x86_64-cpython-37/spconv/build/core_cc/src/cumm/conv/main/ConvMainUnitTest/ConvMainUnitTest_matmul_split_Simt_f32f32f32_0.cu(222)
int64_t(N) * int64_t(C) *…
-
**Describe the bug**
win camek , but make error.
-- Enable device reference verification in conv unit tests
-- Generating D:/github/cutlass-main/build/test/unit/conv/device/cutlass_test_unit_conv…