-
We are currently trying to apply torchtitan to MoE models. MoE models require using grouped_gemm https://github.com/fanshiqing/grouped_gemm. GroupedGemm ops basically follow the same rule as in Column…
-
有没有朋友遇到过这样的问题。model.generate()这个位置报的错
-
Current output of test 11:
```
group-gemm-performance:
N cuBLAS Triton
0 128.0 0.11488 276574.06250
1 256.0 0.12080 276332.68750
2 512.0 0.14360 276066.46875
3 1…
-
hi, i have a question about depthwise conv with params like 1x1 filter, stride=1, pad=0, dilation=1, i have a compile error raise by checking kWarpGemmIterations in cutlass/conv/threadblock/depthwise_…
-
@efrantar
Awesome work -- always enjoy your research on and implementation of efficient model inference.
I was hoping that you could shed some light on the logic of the [packing](https://github…
-
Hello, I have some trouble to compile composable_kernel for my AMD GPU architecture (gfx1010)
```
cmake …
-
**Is your feature request related to a problem? Please describe.**
I am trying to retarget the llm artifacts to my own FPGA board. I'd like to regenerate the HLS code to try more aggressive quantizat…
-
Hi,
I created a small Rust example:
```
use gemm_f16::f16;
fn main() {
println!("Hello, fp16!");
let a = f16::from_f32(3.1f32);
let b = f16::from_f32(2.2f32);
let…
-
**What is your question?**
In the examples provided, EVT demonstrates the capability to fuse different epilogue functions, optimizing their execution. I'm interested in knowing whether EVT can also i…
-
Hello @AnonymousYWL ,
Can you please provide instructions on how to use the libshalom2 library and also its gemm kernel API's? How to run a basic example code on your novel gemm kernel?