FR: Make SpeechRecognition etc. large AI libs just "extra" dependencies.

deanmalmgren / textract

extract text from any document. no muss. no fuss.

MIT License

3.86k stars 592 forks source link

I came here to suggest the same thing: It would be great if textract was more lightweight by default. I only need something to extract text from common document formats such as .pdf, .rtf, .docx. The dependency on SpeechRecognition is problematic because its massive size greatly slows down build time of our project and increases the size of the resulting Docker image substantially.

As @kxrob suggested, the dependency could be moved to "extra" and the tool could provide clear instructions if the package is unavailable when trying to extract text from an audio file, e.g. "Extracting text from audio files is an optional feature. Please run pip install SpeechRecognition~=3.8.1".

deanmalmgren / textract

FR: Make SpeechRecognition etc. large AI libs just "extra" dependencies. #451