Closed Jeronymous closed 9 months ago
Hi, thanks for using this repo :)
You only need to get a byte view of the numpy array and pass it to split
data = audio.numpy().astype(np.int16).view(np.uint8) # multiply audio.numpy() by 32767 if needed
segments = auditok.split(data, other_params...)
For a more generic code you can call astype
with the right parameter, depending on sample width, and flatten stereo data:
sample_width_to_numpy = {1 : np.int8, 2: np.int16, 4: np.int32}
fmt = sample_width_to_numpy[your_sample_width]
data = a.T.reshape(-1).astype(fmt)).view(np.int8)
A numpy array with stereo data is expected to have a shape = (n_channels, n_samples)
.
If your array is of shape (n_samples, n_channels)
, then use just a.reshape(-1).astype(fmt)).view(np.int8)
(without transpose).
If you're not sure the data is converted correctly, you can save it then listen to it:
audio = auditok.AudioRegion(data, sampling _rate=SAMPLE_RATE, sample_width=2, channels=1)
audio.save("audio.wav")
if you have pyaudio
installed, just play it:
audio.play()
Use Cltr+C to stop playing long audio.
Thank you for your answer :)
Neither view(np.uint8)
nor view(np.int8)
(both mentioned in your message) were working, because view()
returns <class 'numpy.ndarray'>
not bytes.
The simplest solution by looking a bit at numpy doc was:
data = (audio.numpy() * 32767).astype(np.int16).tobytes()
Thank you for this good repo!
I find that auditok works in general better than popular VAD like silero (which can have unexplained behaviour on some types of audio). I'd like to use it in my project, but I struggle to do so, because when I call the VAD, I don't have access to a wav file. The only way I found to pass the torch vector of raw audio is to use this awkward conversion:
Is there a better way to do that?
If you want to see more, or directly comment on the related PR, it's here: https://github.com/linto-ai/whisper-timestamped/pull/78/files#diff-4d4adecf50ce8affc04f13ab7274717945dd716eb910225ff154f717e81c3b64R1791