breezedeus / Pix2Text

An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.
https://p2t.breezedeus.com
MIT License
1.98k stars 188 forks source link

CoreML integration #150

Open DwarfCommunist opened 1 month ago

DwarfCommunist commented 1 month ago

Hello, sorry for disturbing. Amazing project. Currently I'm trying to integrate project into swift using coreml package (https://huggingface.co/breezedeus/pix2text-mfd-coreml)

As I see after passing 768*768 image, it returns vector MultiArray (Float32 1 × 6 × 12096). Could you help me, how to process output to string, should I pick some indexes from it and convert to symbols? Sorry don't have a lot experience in ML Thanks for your time.

breezedeus commented 1 month ago

Sorry, I am not familiar with the specific usage method of the CoreML model. I only used the script of Ultralytics to convert the pt format model to CoreML. But I don't know how to use it in non-Python environments.