Closed HZQ950419 closed 1 year ago
During inference, if we don't load the INT8 model, there will be an error RuntimeError: expected scalar type Half but found Float for both bottleneck and LoRA adapters. This PR fixed the error.
RuntimeError: expected scalar type Half but found Float
LLaMA, GPT-j, and BLOOM are tested.
During inference, if we don't load the INT8 model, there will be an error
RuntimeError: expected scalar type Half but found Float
for both bottleneck and LoRA adapters. This PR fixed the error.LLaMA, GPT-j, and BLOOM are tested.