getomni-ai / zerox

PDF to Markdown with vision models
https://getomni.ai/ocr-demo
MIT License
6.58k stars 358 forks source link

Multi-Linguistic? #70

Open harshit-raizada opened 1 month ago

harshit-raizada commented 1 month ago

It's not able to provide correct extraction for Urdu and Bengali. It's working fine for English. Is it not multi-lingual?

pradhyumna85 commented 1 month ago

@harshit-raizada, pyzerox itself doesn't provide the model, for eg. in python sdk you can choose from various model provider, and pyzerox will use it, at most you can tweak the default system prompt in your language to see it that improves

harshit-raizada commented 1 month ago

What I meant is in your UI, for english it’s fine but when I upload Urdu or Bengali docs, the results are not correct. Just let me know that can your tool support multi-lingual docs at the moment?

pradhyumna85 commented 1 month ago

@tylermaran, what is the model in the backend that we are using in the demo ui?

pradhyumna85 commented 4 weeks ago

@harshit-raizada, please don't follow the INSTRUCTIONS ON THE ABOVE LINK SHARED BY ummm288

@tylermaran, @annapo23 please block the previous comment, the link contains a malware.

Also report the user.