camel-ai / crab

CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/
https://crab.camel-ai.org/
194 stars 28 forks source link

[Feature Request] OmniParser visual prompt #31

Open dandansamax opened 2 months ago

dandansamax commented 2 months ago

OmniParser is a visual prompt method, including a finetuned interactable icon detection model, a finetuned icon description model, and an OCR module. It should be more accurate than GroundingDino.

https://arxiv.org/abs/2408.00203