Bladieblah / xpdf-python

Python wrapper around the pdftotext functionality of xpdf
GNU General Public License v3.0
2 stars 2 forks source link

Feature Request Image support #12

Closed ReMiOS closed 1 year ago

ReMiOS commented 1 year ago

When i iterate through the images in the loaded document. It returns the following data: {'page_number': 1, 'width': 595.56, 'height': 842.04, 'images': [{'width': 167.65, 'height': 42.8}, {'width': 14.3..1, 'height': 14.195}]}

Is it possible to return an Pillow image containing the found image in de pdf document as well ?

Bladieblah commented 1 year ago

It's definitely possible to read the image data itself but I only needed the metadata. I'm not sure about generating pillow images but numpy arrays are definitely possible, numpy has a very nice extension of the c api.

ReMiOS commented 1 year ago

I think numpy arrays should work even better, since my OCR libary needs an numpy array ( i convert them from a pillow image)

Bladieblah commented 1 year ago

@ReMiOS I released the changes on pip in version 0.1.2!

ReMiOS commented 1 year ago

Thanks for the update !

Only on pypi it shows 0.1.4 and in the source it shows 0.1.3