Closed albert-haam closed 4 months ago
Please add code below just before output_ids = model.generate(
:
model.get_vision_tower().to('cuda')
input_ids = input_ids.to('cuda')
Thank you.
Do you know how much GPU memory is needed to run the sample?
A2000 GPU has 8GB and I unable to run the sample with my GPU as "CUDA out of memory".
Hi @albert-haam, thank you for your interest in Bunny.
8G isn't enough for scripts in quick start. For quick start and cli inference, we've tested on our device and it occupies just about 9G. To reduce GPU memory consumption, please try quantizing the model. It's currently supported in cli, where you can set --load-8bit
in bunny/serve/cli.py
.
Feel free to comment on this issue if you have further confusion.
Regards Russell BAAI
I'll close this issue since we have provided some detailed info regarding GPU memory in Bunny. Reopen it if you still have confusions.
Dear,
I'm quite struggling to make sample code works on my laptop with a Nvidia A2000(8GB) card.
Does anyone has an advice?
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
import torch import transformers from transformers import AutoModelForCausalLM, AutoTokenizer from PIL import Image import warnings import pathlib
disable some warnings
transformers.logging.set_verbosity_error() transformers.logging.disable_progress_bar() warnings.filterwarnings('ignore')
set device
torch.set_default_device('cuda') # or 'cuda'
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') torch_device = 'cuda' #auto, cpu
model_name = 'BAAI/Bunny-v1_0-3B' # or 'BAAI/Bunny-v1_0-2B-zh'
create model
model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.float16, device_map=torch_device, trust_remote_code=True)
model.to(device)
tokenizer = AutoTokenizer.from_pretrained( model_name, trust_remote_code=True)
text prompt
prompt = 'What happened in the image?' text = f"A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER:\n{prompt} ASSISTANT:"
text_chunks = [tokenizer(chunk).input_ids for chunk in text.split('')]
input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).unsqueeze(0)
input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).to(torch_device).unsqueeze(0)
input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=model.dtype, device=torch_device).unsqueeze(0)
input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=model.dtype, device=torch_device).to(torch_device).unsqueeze(0)
local image
file = pathlib.Path('C:/Users/Admin/Utils/Bunny-AI/slippery-person.jpeg') image = Image.open(file) image_tensor = model.process_images([image], model.config)
generate
output_ids = model.generate( input_ids,
images=image_tensor
print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())