VinAIResearch / PhoGPT

PhoGPT: Generative Pre-training for Vietnamese (2023)
Apache License 2.0
739 stars 67 forks source link

4/8-bit with bitsandbytes #19

Closed datquocnguyen closed 6 months ago

datquocnguyen commented 8 months ago

See: https://huggingface.co/docs/transformers/main/en/quantization#bitsandbytes

import torch
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer

quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16)

model_4bit = AutoModelForCausalLM.from_pretrained("vinai/PhoGPT-4B-Chat", quantization_config=quantization_config, device_map="auto", trust_remote_code=True)

Or:

import torch
from transformers import BitsAndBytesConfig, AutoConfig, AutoModelForCausalLM, AutoTokenizer

config = AutoConfig.from_pretrained("vinai/PhoGPT-4B-Chat", trust_remote_code=True)  
config.init_device = "cuda"

quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16)

model_4bit = AutoModelForCausalLM.from_pretrained("vinai/PhoGPT-4B-Chat", quantization_config=quantization_config, config=config, trust_remote_code=True)
datquocnguyen commented 7 months ago
model_8bit = AutoModelForCausalLM.from_pretrained("vinai/PhoGPT-4B-Chat", device_map="auto", load_in_8bit=True, trust_remote_code=True)

Or:

config = AutoConfig.from_pretrained("vinai/PhoGPT-4B-Chat", trust_remote_code=True)  
config.init_device = "cuda"

model_8bit = AutoModelForCausalLM.from_pretrained("vinai/PhoGPT-4B-Chat", config=config, load_in_8bit=True)