Closed cilynx closed 1 year ago
Document.pages() is returning the wrong number of pages.
Document.data[] only has one page no matter how long the document.
Document.data[]
is built by running pytesseract.image_to_data()
on Document.processed
which is an opened PIL.Image
. So far as I can tell, image_to_data
only works on the first frame of a PIL.Image
. Looks like we either need to iterate over the frames or give pytesseract
the TIFF file directly instead of through PIL.
Seems to only be happening on multi-page documents. Looking at the logs, the magic words/phrases parser may only be running on the first page.