google-research / pix2struct

Apache License 2.0
587 stars 53 forks source link

How to use pix2struct for pure OCR tasks #33

Open ash2703 opened 1 year ago

ash2703 commented 1 year ago

Using the OCR-VQA model does not always give consistent results when the prompt is left unchanged What is the most consitent way to use the model as an OCR?

hrithickcodes commented 1 year ago

You can use the fine-tuned text-caps model, train it again on ocr tasks which involves learning to generate text from a given image. I hope this will get you there.