deanmalmgren / textract

extract text from any document. no muss. no fuss.
http://textract.readthedocs.io
MIT License
3.86k stars 592 forks source link

Text can't be extracted from scanned PDF, jpg and png. #445

Open Takip31 opened 1 year ago

Takip31 commented 1 year ago

Describe the bug The .txt file only shows arrows without any text presence.

To Reproduce Steps to reproduce the behavior: Use this code:

import glob import textract

file=glob.glob(r'path/to/retrieve/file.extension') for file_path in file:
text=textract.process(file_path) with open(f'{file_path[:-4]}.txt', 'w') as file: file.write(text)

Expected behavior The text from file should be showing up.

Screenshots

Capture

Desktop (please complete the following information):

Additional context Add any other context about the problem here.