issues
search
SqueezeBits
/
QUICK
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
MIT License
112
stars
5
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Issue with Quantized Inference on Mistral-7B v0.1
#10
acousticsmh
opened
1 week ago
0
why does de-quantization introduce bank conflict?
#9
sleepwalker2017
opened
6 months ago
0
kernel speed compare base on Llama7b
#8
shiqingzhangCSU
opened
8 months ago
0
Kernel benchmarks script
#7
shiqingzhangCSU
opened
8 months ago
0
Qkv fuse
#6
JHLEE17
closed
8 months ago
0
ModuleNotFoundError: No module named 'quick'
#5
Quang-elec44
opened
8 months ago
3
n_generate cannot be longer than context
#4
Gutianpei
closed
8 months ago
1
Lack of `exclude_layers_to_not_quantize` after `get_named_linears`
#3
noah-kim-theori
closed
8 months ago
1
marlin 커널 과의 속도 비교
#2
qwopqwop200
opened
9 months ago
1
Update README.md
#1
eltociear
closed
9 months ago
1