Open d-ph opened 1 month ago
There are two tasks here:
1) Parse PDF page content to locate objects on the page; and 2) Do PDF text extraction.
The first will be coming soon. The second will happen, but only for well-behaved modern PDFs. I don't want to get into the full field of PDF text extraction - it's a complex thing.
Understood and fair. Thanks for the information and explanation 👍
Hello,
Similar to how cpdf can list images with the
-image-resolution
operation, would it be possible to add a cpdf operation that lists text object (most importantly: their size and location) found in a pdf?The caveat being that "text that has been converted to vector outlines" would not be detected by that new cpdf operation, which is understandable.
Regards.