Closed yzh119 closed 4 months ago
For group gemm in MoE and LoRA models, we use the combination of custom CUDA kernels and cutlass & composable kernels (for NVIDIA & AMD GPUs, correspondingly).
This PR adds dependency to the two repos.
For group gemm in MoE and LoRA models, we use the combination of custom CUDA kernels and cutlass & composable kernels (for NVIDIA & AMD GPUs, correspondingly).
This PR adds dependency to the two repos.