Some pages are not being fully detected by the Nougat OCR model. In many cases, only half of the content on a page is detected, while the rest is skipped. However, for other pages, the detection works perfectly fine.
Steps to Reproduce:
Convert the PDF into images (one image per page).
Process each image using the Nougat OCR model individually.
Observe that some pages are partially detected, while others are processed correctly.
(This is the notebook I'm following for inference )
Expected Behavior: The OCR model should consistently detect all parts of each page, rather than only detecting part of the content.
Question: Is there any preprocessing that needs to be done to ensure complete page detection? Or are there specific parameters that should be adjusted in Nougat OCR to improve the results?
Some pages are not being fully detected by the Nougat OCR model. In many cases, only half of the content on a page is detected, while the rest is skipped. However, for other pages, the detection works perfectly fine.
Steps to Reproduce:
Observe that some pages are partially detected, while others are processed correctly.
(This is the notebook I'm following for inference )
Example Results:
Second Example: For this page:
Expected Behavior: The OCR model should consistently detect all parts of each page, rather than only detecting part of the content.
Question: Is there any preprocessing that needs to be done to ensure complete page detection? Or are there specific parameters that should be adjusted in Nougat OCR to improve the results?