Closed bakachan19 closed 1 year ago
You can set the load_in_8bit
as True
to enable 8-bits support. See
https://github.com/X-PLUG/mPLUG-Owl/blob/main/serve/model_worker.py#L46
Note that the 8bit only works for torch.half
not work for torch.bfloat16
With load_in_8bit=True
and torch_dtype = torch.half
it works!
Thank you so much!
Hi. Thanks for providing the code for huggingface. I am trying to use the following code in colab, but the session crashes because it runs out of ram. I am using colab pro with high-ram setup with 25 gb of ram and T4 gpu. But the session still crashes.
In the readme it was mentioned that the offline demo can be inferenced with only a single 16GB T4 GPU with 8 bits support. How can I do this in colab?
Thank you!