I encountered this error trying to pipe sox to auditok on mac, with python 3:
$ sox -t coreaudio -d -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i -
Input File : 'default' (coreaudio)
Channels : 1
Sample Rate : 48000
Precision : 32-bit
Sample Encoding: 32-bit Signed Integer PCM
In:0.00% 00:00:00.26 [00:00:00.00] Out:2.45k [ | ] Clip:0 Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/local/Cellar/python/3.6.4_3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.6/site-packages/auditok-0.1.6-py3.6.egg/auditok/cmdline.py", line 370, in run
self.tokenizer.tokenize(data_source=self, callback=notify_observers)
File "/usr/local/lib/python3.6/site-packages/auditok-0.1.6-py3.6.egg/auditok/core.py", line 300, in tokenize
frame = data_source.read()
File "/usr/local/lib/python3.6/site-packages/auditok-0.1.6-py3.6.egg/auditok/cmdline.py", line 384, in read
return self.ads.read()
File "/usr/local/lib/python3.6/site-packages/auditok-0.1.6-py3.6.egg/auditok/util.py", line 547, in read
return self.audio_source.read(self.block_size)
File "/usr/local/lib/python3.6/site-packages/auditok-0.1.6-py3.6.egg/auditok/io.py", line 390, in read
data = sys.stdin.read(to_read)
File "/usr/local/Cellar/python/3.6.4_3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 12: invalid start byte
The error is caused by python3 attempting to interpret the sys.stdin.read bytes as utf8. This patch uses sys.stdin.buffer.read on python3, and continues to use the existing call on python 2.
I encountered this error trying to pipe sox to auditok on mac, with python 3:
The error is caused by python3 attempting to interpret the sys.stdin.read bytes as utf8. This patch uses sys.stdin.buffer.read on python3, and continues to use the existing call on python 2.