issues
search
fpgaminer
/
GPTQ-triton
GPTQ inference Triton kernel
Apache License 2.0
284
stars
23
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
load quantized model error
#23
wanghz18
opened
3 months ago
0
GPTQ guide?
#22
vgoklani
opened
1 year ago
1
Replace transformer apply_rotary_pos_emb with triton version
#21
Qubitium
opened
1 year ago
5
multi-gpu and triton kernel problem
#20
Qubitium
closed
1 year ago
0
WIP: Fix autotune not using device
#19
Qubitium
closed
1 year ago
0
question about the quantization formula
#18
irasin
opened
1 year ago
3
rotary embedding and layer norm
#17
qwopqwop200
opened
1 year ago
1
Can I use a CUDA kernel with a model quantized using triton & vice-versa?
#16
vedantroy
closed
1 year ago
3
Cache auto-tuning?
#15
vedantroy
opened
1 year ago
3
Get C++ when exception when trying to load model
#14
vedantroy
closed
1 year ago
5
Does this support non -1 groupsize?
#13
vedantroy
closed
1 year ago
1
warmup_autotune and 4090 observations
#12
Qubitium
closed
1 year ago
2
num_beams > 1 sometimes breaks inference
#11
Qubitium
closed
1 year ago
9
percdamp clarification for dummies
#10
Qubitium
closed
1 year ago
2
Apply flash attention
#9
qwopqwop200
closed
1 year ago
1
Weight conversion help
#8
catid
closed
1 year ago
14
1-bit acceleration support
#7
NicoNico6
opened
1 year ago
2
Inference throwing: TypeError: forward() got an unexpected keyword argument 'position_ids
#6
Qubitium
closed
1 year ago
7
Cuda vs Triton on an RTX 3060 12GB
#5
1aienthusiast
opened
1 year ago
12
Testing triton on 30b model vs quant_cuda
#4
Qubitium
closed
1 year ago
2
Getting "CUDA error: an illegal memory access was encountered" with model.generate
#3
apenugon
closed
1 year ago
5
safe tensor support for convert weights
#2
DanielWe2
closed
1 year ago
1
Needs more VRAM than normal GPTQ CUDA version?
#1
DanielWe2
opened
1 year ago
3