deanmalmgren / textract

extract text from any document. no muss. no fuss.
http://textract.readthedocs.io
MIT License
3.89k stars 599 forks source link

Extract text directly from file-object / file-content rather than using filename #360

Open jrkkfst opened 3 years ago

jrkkfst commented 3 years ago

Maybe this is already possible?

How would i go about to extract text from the content of a file, rather than reading the file itself? Background is that using the upload component in Dash, one gets the content of the file rather than a pointer to the file location.

Perhaps, by accessing one of the internal functions in textract and specifying an extension this is already possible?

traverseda commented 3 years ago

Unfortunately there's no way to do that, as textract launches external commands (most notably pdftotext) to process files sometimes.