Using the OCR-VQA model does not always give consistent results when the prompt is left unchanged
What is the most consitent way to use the model as an OCR?
You can use the fine-tuned text-caps model, train it again on ocr tasks which involves learning to generate text from a given image. I hope this will get you there.
Using the OCR-VQA model does not always give consistent results when the prompt is left unchanged What is the most consitent way to use the model as an OCR?