-
/root/anaconda3/envs/chatglm3_v2/lib/python3.10/site-packages/awq/modules/linear/exllama.py:12: UserWarning: AutoAWQ could not load ExLlama kernels extension. Details: libcudart.so.12: cannot open sha…
-
Model:
```mermaid
graph TD;
Input1["Input
src1: fp32"]
Quantise1["NEQuantizationLayer
q_src1: QASYMM8_SIGNED"]
Input2["Input
src2: fp32"]
Quantise2["NEQuantization…
-
**What is your question?**
https://github.com/NVIDIA/cutlass/blob/f7b19de32c5d1f3cedfc735c2849f12b537522ee/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized.hpp#L477-L554
I underst…
-
The default lookahead in GEMM is 2, which is too small if running on multiple nodes, especially for large scaling. It should be related to the process grid.
-
flashdecoding++ paper: https://arxiv.org/abs/2311.01282
- Q3 Collaboration Plan of Infra and IaaS Labs: https://bytedance.us.larkoffice.com/docx/HKXfdRh1noMrbAxcgL2ureGasdQ
- FlashDecoding++ Sum…
-
### Your current environment
The output of `python collect_env.py`
```text
Your output of `python collect_env.py` here
```
### 🐛 Describe the bug
When N=64, we don't have 4*8=32 c_…
-
I am attempting to emit pytorch code but unfortunately it does not work for fp8, bf16, and int8. I have tried to patch the converter type dict https://github.com/OrenLeung/cutlass/commit/6d619c964eb8b…
-
Hello, I have some trouble to compile composable_kernel for my AMD GPU architecture (gfx1010)
```
cmake …
-
`PYTHONPATH="." GPU=1 IMAGE=0 python -m pytest test/test_ops.py -k test_gemm_fp16` passed
`PYTHONPATH="." GPU=1 IMAGE=2 python -m pytest test/test_ops.py -k test_gemm_fp16` failed with `Exception: …
-
**What is your question?**
![image](https://github.com/user-attachments/assets/98eab07b-1903-425e-9439-5178169c52e4)
Like here, I see many usage of PipelineState but find no definition. I do find …