Yuliang-Liu / Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
MIT License
1.77k stars 122 forks source link

demo doesn't give OCR with grounding #95

Closed jeong-tae closed 3 months ago

jeong-tae commented 3 months ago

In the paper, textMonkey in Figure 5, it claims the textMonkey can generate bounding boxes but they always say "use OCR model". How can I get the OCR visualized result from the model?

echo840 commented 3 months ago

“OCR with grounding:” Hello, you can use this prompt in the demo(https://github.com/Yuliang-Liu/Monkey/blob/main/demo_textmonkey.py).

jeong-tae commented 3 months ago

I used this: http://vlrlab-monkey.xyz:7681/ image

it doesn't give OCR results. is this intended in the provided demo? should I run demo_textmonkey.py myself to get OCR results?

echo840 commented 3 months ago

I'm sorry, the demo http://vlrlab-monkey.xyz:7681/ is for Monkey, and for TextMonkey you should run the code (https://github.com/Yuliang-Liu/Monkey/blob/main/demo_textmonkey.py