KoljaB / RealtimeSTT

A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
MIT License
2.1k stars 191 forks source link

Is there a way to use it with a web app instead of native mic #12

Closed abhishek-tg closed 1 week ago

abhishek-tg commented 11 months ago

I was just trying to create a web app and wanted to modify this to use it to a web app like from JS. Is there a sample?

KoljaB commented 11 months ago

Currently only a python webclient sample.

KoljaB commented 11 months ago

Just realized that the current client does not record and send chunks.

This is a useful feature and very needed. Need to think about how to integrate taking chunks into the API, will then provide a JS client.

abhishek-tg commented 11 months ago

Also it uses pyaudio input stream which will be changed to socket queue or something.

KoljaB commented 11 months ago

Available now, please check this example with the new v0.1.8 Version.

abhishek-tg commented 11 months ago

Thanks i was able to modify it and use it frequently, however I have a question when using for multiple users how will i handle a recorder thread?, will we have multiple threads or a unique id to distinguish the speech classified between users.

KoljaB commented 11 months ago

Depends on what you want to achieve. Handling multiple user inputs in parallel will be not easy, especially if you want to also realtime transcribe. First you'd need to change RealtimeSTT for this, the processing is not designed for multiple incoming audio chunk feeds. You would need to create multiple worker threads for every feed. While a user talks the server needs to do voice activity detection and transcription, which needs VRAM and causes load on GPU. So either you'd need to load balance VAD and transcription somehow or you'd need big amounts of VRAM and GPU power on the server to handle that.