How to load the 8bits model with Huggingface in colab

bakachan19 commented 1 year ago

Hi. Thanks for providing the code for huggingface. I am trying to use the following code in colab, but the session crashes because it runs out of ram. I am using colab pro with high-ram setup with 25 gb of ram and T4 gpu. But the session still crashes.

# Load via Huggingface Style
from mplug_owl.modeling_mplug_owl import MplugOwlForConditionalGeneration
from mplug_owl.tokenization_mplug_owl import MplugOwlTokenizer
from mplug_owl.processing_mplug_owl import MplugOwlImageProcessor, MplugOwlProcessor

pretrained_ckpt = 'MAGAer13/mplug-owl-llama-7b'
model = MplugOwlForConditionalGeneration.from_pretrained(
    pretrained_ckpt,
    torch_dtype=torch.bfloat16,
)
image_processor = MplugOwlImageProcessor.from_pretrained(pretrained_ckpt)
tokenizer = MplugOwlTokenizer.from_pretrained(pretrained_ckpt)
processor = MplugOwlProcessor(image_processor, tokenizer)

In the readme it was mentioned that the offline demo can be inferenced with only a single 16GB T4 GPU with 8 bits support. How can I do this in colab?

Thank you!

MAGAer13 commented 1 year ago

You can set the load_in_8bit as True to enable 8-bits support. See https://github.com/X-PLUG/mPLUG-Owl/blob/main/serve/model_worker.py#L46

Note that the 8bit only works for torch.half not work for torch.bfloat16

bakachan19 commented 1 year ago

With load_in_8bit=True and torch_dtype = torch.half it works! Thank you so much!

X-PLUG / mPLUG-Owl

How to load the 8bits model with Huggingface in colab #54