fixed inference error without loading int8 model

AGI-Edgerunners / LLM-Adapters

Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"

https://arxiv.org/abs/2304.01933

Apache License 2.0

1.02k stars 92 forks source link

fixed inference error without loading int8 model #6

Closed HZQ950419 closed 1 year ago

HZQ950419 commented 1 year ago

During inference, if we don't load the INT8 model, there will be an error RuntimeError: expected scalar type Half but found Float for both bottleneck and LoRA adapters. This PR fixed the error.

LLaMA, GPT-j, and BLOOM are tested.