Hi, I am running model:chimera-inst-chat-13b in 8-bit on a A100, and it cost almost double time than FP16 version, is it normal?
And I found that 8-bit LLM is slower than FP16 in HuggingFace's blog: https://huggingface.co/blog/hf-bitsandbytes-integration
Hi, I am running model:chimera-inst-chat-13b in 8-bit on a A100, and it cost almost double time than FP16 version, is it normal? And I found that 8-bit LLM is slower than FP16 in HuggingFace's blog: https://huggingface.co/blog/hf-bitsandbytes-integration