Closed Stefan-Kosker closed 7 months ago
This is realized for applications needing to process the whole incoming microphone input. Like if you need to do volume controls or record the whole incoming audio traffic. If you need to access the whole recorded audio of the last transcription process you can use last_transcription_bytes contaning the raw pcm after a transcription.
It would make much more sense to have it before the transcription starts though. And besides, saving it using wave, this sounds like the normalized form and not the real raw I guess.
But great library. Thank you mate
It would make much more sense to have it before the transcription starts though.
Oh, yes. I agree. Thanks for the hint.
Fyi, saving the raw pcm via
with wave.open('myfile.wav', 'wb') as wavfile:
wavfile.setparams((2, 2, 44100, 0, 'NONE', 'NONE'))
wavfile.writeframes(recorder.last_transcription_bytes)
results in weird noises but not the record. There is a high probability, that the problem is, me, a senior in TS and C# but a bloody beginner in python :D
Try to save the wav as mono 16000 pls this looks like stereo 44100
on_recorded_chunk kinda ignores the voice activation process and is called as often as the CPU allows it per second, providing 1kb Data without any voice or whatsoever.
I kinda don't see any use case why this should be helpful. On the other hand, I guess a full data exporting function at the end of voice activation makes much more sense, since this a) has real user data instead of white noise 1kb chunk b) this real data can be f.e sent to an external Speech to Text Database or just to a server for later usage / further training.
Or am I thinking something wrong here?