OpenAdaptAI / OpenAdapt

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
https://www.OpenAdapt.AI
MIT License
883 stars 115 forks source link

Support LLava or other local vision models instead of using OpenAI GPT4-vision #674

Open ai-agents-challenge opened 4 months ago

ai-agents-challenge commented 4 months ago

Feature request

Instead of relying solely ton OpenAI's GPT4-vision for image processing, provide a locally hosted alternative, such as LLAVA.

Motivation

OpenAI often gives this error when parsing images: "Your input image may contain content that is not allowed by our safety system."

abrichr commented 4 months ago

Related:

abrichr commented 3 months ago

Related: https://community.openai.com/t/your-input-image-may-contain-content-that-is-not-allowed-by-our-safety-system-vision-api-response/653372/17

I expect that the AI is denying your request because it doesn’t know if you are trying to solve a CAPTCHA or attempting to use the AI for other purposes it has been trained to prohibit, such as driving cars or tasks beyond the capabilities of computer vision.

https://community.openai.com/t/vision-api-image-not-allowed-by-our-safety-system/679147

One thing which can help is to modify the image slightly to make it look less like a CAPTCHA. I discovered this as a side-effect of using “set-of-marks” prompting with the vision model.

Mostly it’s “business related” information that OpenAI will refuse to OCR, like people’s names, addresses, emails, phone numbers, company names, etc. So as long as your use case doesn’t involve business info you’ll be fine, …unless/until OpenAI changes their mind and censors your use case as well.