Open svmrw opened 3 months ago
I tried to make changes manually based on your commit. The error is no longer displayed, but... OCR Surya still loads and recognizes the whole file. Ie: OCR_ENGINE=None and OCR_ENGINE=Surya work the same. No changes are visible. I most likely assume that I am doing something wrong, so I ask you to check it yourself.
Running into the same and as OCR runs my machine into max memory, I need to use a different software now.. dead end
The problem is still relevant. Changes from here did not help at all either.
Personally, I don't care about performance. The thing is that OCR recognition spoils embedded images. So I would like OCR_ENGINE=None to work.
Hello. The Readme says the following:
Running the command gives the following:
I really want to convert pdf to markdown, but not use OCR. Almost all pdf files have text that can be selected and copied, and embedded images need to be kept original. It seems to me that the whole document does not need to be recognized as an image if the text is easy to copy.
Please tell me, is this somehow possible or impossible? Maybe it was supported before, but now it is not? Or maybe I am doing something wrong? Thanks.