LeoFCardoso / pdf2pdfocr

A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!
Apache License 2.0
266 stars 33 forks source link

A rectangular block is the only portion being selected from within a paragraph. #42

Closed yatrik-cloud closed 12 months ago

yatrik-cloud commented 1 year ago

As you can see in the below image, any solution to this problem? image

LeoFCardoso commented 1 year ago

Hello, thank you again for the issue.

In this case, I need the source PDF for debug.

Can you please share it here?

yatrik-cloud commented 12 months ago
  1. This is that pdf : Dummy_IS.pdf

  2. image

This is one more example of the same ...

  1. pdf2pdfocr sometimes does not work well on the tabular data.

These are the issues so kindly check these, and get back to me ...

LeoFCardoso commented 12 months ago

Hello. I couldn't reproduce the bug. Here's the result using Adobe reader and "-r 200".

image image

Please run this file with "-v" (verbose) and send me the entire texto output generated. Also, please inform your operating system.

yatrik-cloud commented 12 months ago

solved the bug using "-r 200", and output in Adobe reader. Thank you.