deanmalmgren / textract

extract text from any document. no muss. no fuss.
http://textract.readthedocs.io
MIT License
3.84k stars 585 forks source link

Suggestion: Add support for .pdf files #505

Open Hala-Hamdoun opened 3 months ago

Hala-Hamdoun commented 3 months ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Which filetype should textract support? A clear and concise description of file types you think textract should be able to process.

Which external software (python or command line tool), can parse the requested file type A clear and concise description of tools that can parse the desired filetype.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.