amsehili / auditok

An audio/acoustic activity detection and audio segmentation tool
MIT License
732 stars 94 forks source link

Use binary read for stdin on python3 #16

Closed ps2 closed 5 years ago

ps2 commented 5 years ago

I encountered this error trying to pipe sox to auditok on mac, with python 3:

$ sox -t coreaudio -d -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i -

Input File     : 'default' (coreaudio)
Channels       : 1
Sample Rate    : 48000
Precision      : 32-bit
Sample Encoding: 32-bit Signed Integer PCM

In:0.00% 00:00:00.26 [00:00:00.00] Out:2.45k [      |      ]        Clip:0    Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.6.4_3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.6/site-packages/auditok-0.1.6-py3.6.egg/auditok/cmdline.py", line 370, in run
    self.tokenizer.tokenize(data_source=self, callback=notify_observers)
  File "/usr/local/lib/python3.6/site-packages/auditok-0.1.6-py3.6.egg/auditok/core.py", line 300, in tokenize
    frame = data_source.read()
  File "/usr/local/lib/python3.6/site-packages/auditok-0.1.6-py3.6.egg/auditok/cmdline.py", line 384, in read
    return self.ads.read()
  File "/usr/local/lib/python3.6/site-packages/auditok-0.1.6-py3.6.egg/auditok/util.py", line 547, in read
    return self.audio_source.read(self.block_size)
  File "/usr/local/lib/python3.6/site-packages/auditok-0.1.6-py3.6.egg/auditok/io.py", line 390, in read
    data = sys.stdin.read(to_read)
  File "/usr/local/Cellar/python/3.6.4_3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 12: invalid start byte

The error is caused by python3 attempting to interpret the sys.stdin.read bytes as utf8. This patch uses sys.stdin.buffer.read on python3, and continues to use the existing call on python 2.

amsehili commented 5 years ago

Works fine, thank you for this fix!