Add support for provider-specific image resolutions

continuedev / continue

⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains

Apache License 2.0

19.44k stars 1.69k forks source link

Validations

[X] I believe this is a way to improve. I'll try to join the Continue Discord for questions
[X] I'm not able to find an open issue that requests the same enhancement

Problem

Images are all scaled down, with no provider-specific handling for image resolution or ability to allow full resolution

@FallDown on discord

Does Continue scale down images? Whenever I ask paste in an image with text and ask it to extract the text, it gets it horribly wrong, but on the Web UI for ChatGPT it does fine

Solution

Allow defining provider and model image resolution capabilities

If provider image capabilities are known, could either NOT scale down by default and add completionOptions boolean for scaleImagesDown or similar OR scale down by default and add allowFullImageResolution option or similar

Otherwise, use default (scale down or not)

Relevant docs: https://platform.openai.com/docs/guides/vision#low-or-high-fidelity-image-understanding https://docs.anthropic.com/en/docs/build-with-claude/vision#evaluate-image-size https://ai.google.dev/gemini-api/docs/vision?lang=python#prompting-images

You could set the detail value for OpenAI models to auto instead of low, and allow the image scaling to be determined by a model's completionOptions config.

The capabilities of different models vary a great deal. I don't think you'd want to make the config overly complex, maybe something like imageMaxSize with a value that represents megapixels, e.g. 1.0 would scale down images to 1000 x 1000 pixels (preserving aspect ratio), and imageQuality, which controls the toDataURL's quality (0.0 - 1.0).

https://github.com/continuedev/continue/blob/46bb3a57a9dd10451ae59e5fd73ee3e041982377/gui/src/components/mainInput/TipTapEditor.tsx#L129-L146

continuedev / continue