Open dandansamax opened 2 months ago
OmniParser is a visual prompt method, including a finetuned interactable icon detection model, a finetuned icon description model, and an OCR module. It should be more accurate than GroundingDino.
https://arxiv.org/abs/2408.00203
OmniParser is a visual prompt method, including a finetuned interactable icon detection model, a finetuned icon description model, and an OCR module. It should be more accurate than GroundingDino.
https://arxiv.org/abs/2408.00203