running model with 8-bit, too slow!

FreedomIntelligence / LLMZoo

⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language models.⚡

Apache License 2.0

2.93k stars 201 forks source link

running model with 8-bit, too slow! #19

Open ananwjq opened 1 year ago

ananwjq commented 1 year ago

Hi, I am running model:chimera-inst-chat-13b in 8-bit on a A100, and it cost almost double time than FP16 version, is it normal? 877465245a89e83d2aab0409f45df1ec And I found that 8-bit LLM is slower than FP16 in HuggingFace's blog: https://huggingface.co/blog/hf-bitsandbytes-integration

zhjohnchan commented 1 year ago

Hi @ananwjq,

Thanks for your feedback!

We are working on efficient inference and will give a detailed comparison.

Best, Zhihong

chuckhope commented 1 year ago

waiting for the result!