continuedev / continue

⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains
https://docs.continue.dev/
Apache License 2.0
19.44k stars 1.69k forks source link

Add support for provider-specific image resolutions #2832

Open RomneyDa opened 2 weeks ago

RomneyDa commented 2 weeks ago

Validations

Problem

Images are all scaled down, with no provider-specific handling for image resolution or ability to allow full resolution

@FallDown on discord

Does Continue scale down images? Whenever I ask paste in an image with text and ask it to extract the text, it gets it horribly wrong, but on the Web UI for ChatGPT it does fine

Solution

Allow defining provider and model image resolution capabilities

If provider image capabilities are known, could either NOT scale down by default and add completionOptions boolean for scaleImagesDown or similar OR scale down by default and add allowFullImageResolution option or similar

Otherwise, use default (scale down or not)

FallDownTheSystem commented 2 weeks ago

Relevant docs: https://platform.openai.com/docs/guides/vision#low-or-high-fidelity-image-understanding https://docs.anthropic.com/en/docs/build-with-claude/vision#evaluate-image-size https://ai.google.dev/gemini-api/docs/vision?lang=python#prompting-images

You could set the detail value for OpenAI models to auto instead of low, and allow the image scaling to be determined by a model's completionOptions config.

The capabilities of different models vary a great deal. I don't think you'd want to make the config overly complex, maybe something like imageMaxSize with a value that represents megapixels, e.g. 1.0 would scale down images to 1000 x 1000 pixels (preserving aspect ratio), and imageQuality, which controls the toDataURL's quality (0.0 - 1.0).

https://github.com/continuedev/continue/blob/46bb3a57a9dd10451ae59e5fd73ee3e041982377/gui/src/components/mainInput/TipTapEditor.tsx#L129-L146