Add a "on_recorded" function OR fix on_recorded_chunk

KoljaB / RealtimeSTT

A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.

MIT License

2.1k stars 191 forks source link

Add a "on_recorded" function OR fix on_recorded_chunk #47

Closed Stefan-Kosker closed 7 months ago

Stefan-Kosker commented 7 months ago

on_recorded_chunk kinda ignores the voice activation process and is called as often as the CPU allows it per second, providing 1kb Data without any voice or whatsoever.

I kinda don't see any use case why this should be helpful. On the other hand, I guess a full data exporting function at the end of voice activation makes much more sense, since this a) has real user data instead of white noise 1kb chunk b) this real data can be f.e sent to an external Speech to Text Database or just to a server for later usage / further training.

Or am I thinking something wrong here?

KoljaB commented 7 months ago

This is realized for applications needing to process the whole incoming microphone input. Like if you need to do volume controls or record the whole incoming audio traffic. If you need to access the whole recorded audio of the last transcription process you can use last_transcription_bytes contaning the raw pcm after a transcription.

Stefan-Kosker commented 7 months ago

It would make much more sense to have it before the transcription starts though. And besides, saving it using wave, this sounds like the normalized form and not the real raw I guess.

But great library. Thank you mate

KoljaB commented 7 months ago

It would make much more sense to have it before the transcription starts though.

Oh, yes. I agree. Thanks for the hint.

Stefan-Kosker commented 7 months ago

Fyi, saving the raw pcm via

 with wave.open('myfile.wav', 'wb') as wavfile:
 wavfile.setparams((2, 2, 44100, 0, 'NONE', 'NONE'))
 wavfile.writeframes(recorder.last_transcription_bytes)

results in weird noises but not the record. There is a high probability, that the problem is, me, a senior in TS and C# but a bloody beginner in python :D

KoljaB commented 7 months ago

Try to save the wav as mono 16000 pls this looks like stereo 44100