Closed MichaelDoron closed 1 year ago
Yes sure, we already have fp16 model that cost 16GB mem, but the performance was little bit affected.
@cliangyu
That's great, I didn't realize the fp16 model existed! Can you please refer me to its link, as I couldn't find it? I understand the performance will be slightly worse, but I want to give it a try locally if that's possible.
We have not officially released a fp16 model, but you can make it in one minute with the code below.
from otter import OtterForConditionalGeneration
import torch
load_bit = "fp16"
if load_bit == "fp16":
precision = {"torch_dtype": torch.float16}
elif load_bit == "bf16":
precision = {"torch_dtype": torch.bfloat16}
checkpoint_path = 'your_ckpt_path, probably hugging face model'
model = OtterForConditionalGeneration.from_pretrained(checkpoint_path, device_map='auto', **precision)
# save model
checkpoint_path = checkpoint_path + f"_{load_bit}"
OtterForConditionalGeneration.save_pretrained(model, checkpoint_path)
Great work!
Are there plans to quantize the model, to allow it to run on a single 3090 GPU?
Thanks!