cvlab-columbia / viper

Code for the paper "ViperGPT: Visual Inference via Python Execution for Reasoning"
Other
1.65k stars 115 forks source link

Bug when using BLIP2 models #15

Closed aniki-ly closed 1 year ago

aniki-ly commented 1 year ago

Hello! Thank you for sharing the code of vipergpt. I have noticed that the cropped_image tensor in the ImagePatch function is being divided by 255. However, the BLIP2 model input requires PIL images or tensors that are of the original image scale. Therefore, when using the BLIP2 model, it may be necessary to multiply the cropped_image tensor.

surisdi commented 1 year ago

Hi, thanks for catching that! I will update the code accordingly.