fpgaminer GPTQ-triton issues

fpgaminer / GPTQ-triton

GPTQ inference Triton kernel

Apache License 2.0

278 stars 21 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

load quantized model error

#23 wanghz18 opened 1 month ago
0
GPTQ guide?

#22 vgoklani opened 1 year ago
1
Replace transformer apply_rotary_pos_emb with triton version

#21 Qubitium opened 1 year ago
5
multi-gpu and triton kernel problem

#20 Qubitium closed 1 year ago
0
WIP: Fix autotune not using device

#19 Qubitium closed 1 year ago
0
question about the quantization formula

#18 irasin opened 1 year ago
3
rotary embedding and layer norm

#17 qwopqwop200 opened 1 year ago
1
Can I use a CUDA kernel with a model quantized using triton & vice-versa?

#16 vedantroy closed 1 year ago
3
Cache auto-tuning?

#15 vedantroy opened 1 year ago
3
Get C++ when exception when trying to load model

#14 vedantroy closed 1 year ago
5
Does this support non -1 groupsize?

#13 vedantroy closed 1 year ago
1
warmup_autotune and 4090 observations

#12 Qubitium closed 1 year ago
2
num_beams > 1 sometimes breaks inference

#11 Qubitium closed 1 year ago
9
percdamp clarification for dummies

#10 Qubitium closed 1 year ago
2
Apply flash attention

#9 qwopqwop200 closed 1 year ago
1
Weight conversion help

#8 catid closed 1 year ago
14
1-bit acceleration support

#7 NicoNico6 opened 1 year ago
2
Inference throwing: TypeError: forward() got an unexpected keyword argument 'position_ids

#6 Qubitium closed 1 year ago
7
Cuda vs Triton on an RTX 3060 12GB

#5 1aienthusiast opened 1 year ago
12
Testing triton on 30b model vs quant_cuda

#4 Qubitium closed 1 year ago
2
Getting "CUDA error: an illegal memory access was encountered" with model.generate

#3 apenugon closed 1 year ago
5
safe tensor support for convert weights

#2 DanielWe2 closed 1 year ago
1
Needs more VRAM than normal GPTQ CUDA version?

#1 DanielWe2 opened 1 year ago
3