Hello!
Thank you for sharing the code of vipergpt. I have noticed that the cropped_image tensor in the ImagePatch function is being divided by 255. However, the BLIP2 model input requires PIL images or tensors that are of the original image scale. Therefore, when using the BLIP2 model, it may be necessary to multiply the cropped_image tensor.
Hello! Thank you for sharing the code of vipergpt. I have noticed that the cropped_image tensor in the ImagePatch function is being divided by 255. However, the BLIP2 model input requires PIL images or tensors that are of the original image scale. Therefore, when using the BLIP2 model, it may be necessary to multiply the cropped_image tensor.