Open haiasd opened 8 months ago
I run python -m flexgen.flex_opt --gpu-batch-size 1 --overlap false --model facebook/opt-6.7b --path _DUMMY_ --prompt-len 512 --gen-len 512
and python flex_opt.py --gpu-batch-size 1 --overlap false --hh-ratio 0.2 --hh-all --model facebook/opt-6.7b --path _DUMMY_ --prompt-len 512 --gen-len 512
on one A100 80G GPU. It seems h2o is still slower than baseline.
baseline:
h2o:
I run
bash scripts/streaming/eval.sh full
andbash scripts/streaming/eval.sh h2o
on one A100 80G GPU, while full cost 489s, h2o cost 7200s.
I tried on A30 and have the same conclusion with you 。 我在A30上试了一下,结果full也是快于h2o很多
I run
bash scripts/streaming/eval.sh full
andbash scripts/streaming/eval.sh h2o
on one A100 80G GPU, while full cost 489s, h2o cost 7200s.