issues
search
efeslab
/
Atom
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
259
stars
21
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Question about KV Cache quantization
#23
SherrySwift
opened
2 weeks ago
3
Quention about end-to-end efficiency evaluation of Atom
#22
cokeshao
closed
1 month ago
2
Is it possible to add support for other models?
#21
wlll123456
closed
1 month ago
1
Question about the synchronazation in low-precision kernel
#20
cat538
closed
2 months ago
2
TypeError: QLlamaDecoderLayer.forward() got an unexpected keyword argument 'cache_position'
#19
galenyu
closed
2 months ago
2
LLM model load hanging problem
#18
jimmy-adams
closed
2 months ago
2
Question regarding the efficiency evaluation
#17
FlyFoxPlayer
closed
4 months ago
3
[Major] Add support for Mixtral8x7b
#16
cylinbao
closed
5 months ago
0
The question about calib data
#15
ghost
closed
5 months ago
3
How to load quantized weight?
#14
ghost
closed
5 months ago
1
feat: adapt GPTQ to fp4 quantization
#13
happierpig
closed
5 months ago
1
RuntimeError when quant llama model
#12
ghost
closed
5 months ago
10
feat: add FP4 evaluations
#11
happierpig
closed
5 months ago
0
porting SVD into Atom
#10
shadowpa0327
closed
6 months ago
0
AssertionError
#9
muzi0111
closed
6 months ago
1
error:same device
#8
muzi0111
closed
6 months ago
1
the ppl for llama-7b is very large
#7
priscilla-pan
closed
7 months ago
3
not including dynamic quantizaiton when reproducing results, why?
#6
priscilla-pan
closed
7 months ago
3
Adding OPT support for the simulated quantization.
#5
cylinbao
closed
7 months ago
0
how to compare the performance with vllm/tgit/lightllm or other llm serving framework?
#4
irasin
closed
8 months ago
3
ppl on ptb
#3
MrDoghead
closed
8 months ago
2
issue with `c4` dataset for eval
#2
HamidShojanazeri
closed
8 months ago
1
Update README.md
#1
eltociear
closed
8 months ago
0