PrivateCoffee / matrix-gptbot

GPT Chatbot for Matrix
Other
8 stars 3 forks source link

Request: image vision dynamic resize #14

Closed dillfrescott closed 1 month ago

dillfrescott commented 1 month ago

Often times my users send huge images that cause the vision model to throw an error due to the sheer size of the images pixel wise.

A way for the bot to dynamically resize the image smaller before base 64 sending it to the vision model would be nice.

When the image is too large the bot can't process the image, and it sorta gets stuck until you type ignoreolder.

kumitterer commented 1 month ago

Oooh, good catch! I'll look into this tomorrow.

The API docs don't really include a lot of information on how to prepare images for upload, but according to this, it should be smaller than 2000x768: https://platform.openai.com/docs/guides/vision/managing-images

We already have Pillow as a dependency, so scaling down images should be really easy to do.

dillfrescott commented 1 month ago

Oh excellent! Sounds good!

kumitterer commented 1 month ago

Just tagged a new release, v0.3.13 (3f084ff), that will automatically resize images accordingly. It also comes with new config options in the OpenAI section that allow you to override the defaults of 2000x768 in case Gemini expects something different (I'll admit, I didn't look that up).

https://github.com/PrivateCoffee/matrix-gptbot/blob/3f084ffdd31cb8b45c36d7648f5b9132c51c95bc/config.dist.ini#L142-L150

Please check if that fixes the issue for you.

dillfrescott commented 1 month ago

Works great! Thank you!

dillfrescott commented 1 month ago

I even gave it an image with a size 6916x4616 and it described it perfectly!