deanmalmgren / textract

extract text from any document. no muss. no fuss.
http://textract.readthedocs.io
MIT License
3.92k stars 609 forks source link

mp3 text extraction Exception - 5MB~ file #460

Open RiccardoRomagnoli opened 1 year ago

RiccardoRomagnoli commented 1 year ago

Describe the bug Get HTTP error from SpeechRecognition when trying to extract text from an mp3 file of 5MB

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/site-packages/speech_recognition/init.py", line 840, in recognize_google response = urlopen(request, timeout=self.operation_timeout) File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/urllib/request.py", line 531, in open response = meth(req, response) File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/urllib/request.py", line 640, in http_response response = self.parent.error( File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/urllib/request.py", line 569, in error return self._call_chain(args) File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/urllib/request.py", line 502, in _call_chain result = func(args) File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/urllib/request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 400: Bad Request

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "", line 1, in File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/site-packages/textract/parsers/init.py", line 79, in process return parser.process(filename, input_encoding, output_encoding, kwargs) File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/site-packages/textract/parsers/utils.py", line 46, in process byte_string = self.extract(filename, kwargs) File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/site-packages/textract/parsers/audio.py", line 28, in extract speech = self.extract(temp_filename, method, **kwargs) File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/site-packages/textract/parsers/audio.py", line 39, in extract speech = r.recognize_google(audio) File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/site-packages/speech_recognition/init.py", line 842, in recognize_google raise RequestError("recognition request failed: {}".format(e.reason)) speech_recognition.RequestError: recognition request failed: Bad Request

@jpweytjens