Closed zwenyu closed 1 month ago
Hi @zwenyu Thank you for your interest in our work.
You may want to modify
text_chunks = [tokenizer(chunk).input_ids for chunk in text.split('<image 1>\n<image 2>\n')]
input_ids = torch.tensor(text_chunks[0] + [-201] + [-202] + text_chunks[1][offset_bos:], dtype=torch.long).unsqueeze(0).to(device)
to
text_chunks = [tokenizer(chunk).input_ids for chunk in text.split('<image 1>\n')]
input_ids = torch.tensor(text_chunks[0] + [-201] + text_chunks[1][offset_bos:], dtype=torch.long).unsqueeze(0).to(device)
We use [-201]
and [-202]
to represent images in input text token. Hope it makes sense to you.
Regards
I get it now. Thanks!
Thank you for the interesting work! I'd like to check what's the correct way to run inference with only RGB following the code provided under Quickstart? Using
model.process_images([image1], model.config).to(dtype=model.dtype, device=device)
returnsIndexError: list index out of range
, so it appears two images are expected.