VikParuchuri / marker

Convert PDF to markdown quickly with high accuracy
https://www.datalab.to
GNU General Public License v3.0
16.8k stars 954 forks source link

What should I do if I want to replace the OCR model #153

Closed Lobskodax closed 4 months ago

Lobskodax commented 4 months ago

Hi, @VikParuchuri . I found that the Surya model has poor recognition rate for Chinese characters and Chinese layout. If I want to replace it with another model, how should I do?

VikParuchuri commented 4 months ago

Can you send me an example pdf? You would have to modify the code to change the layout model. For the OCR, you can use tesseract (see the README).