KoljaB / RealtimeSTT

A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
MIT License
1.58k stars 144 forks source link

Is there a way to use it with a web app instead of native mic #12

Open abhishek-tg opened 8 months ago

abhishek-tg commented 8 months ago

I was just trying to create a web app and wanted to modify this to use it to a web app like from JS. Is there a sample?

KoljaB commented 8 months ago

Currently only a python webclient sample.

KoljaB commented 8 months ago

Just realized that the current client does not record and send chunks.

This is a useful feature and very needed. Need to think about how to integrate taking chunks into the API, will then provide a JS client.

abhishek-tg commented 8 months ago

Also it uses pyaudio input stream which will be changed to socket queue or something.

KoljaB commented 8 months ago

Available now, please check this example with the new v0.1.8 Version.

abhishek-tg commented 8 months ago

Thanks i was able to modify it and use it frequently, however I have a question when using for multiple users how will i handle a recorder thread?, will we have multiple threads or a unique id to distinguish the speech classified between users.

KoljaB commented 8 months ago

Depends on what you want to achieve. Handling multiple user inputs in parallel will be not easy, especially if you want to also realtime transcribe. First you'd need to change RealtimeSTT for this, the processing is not designed for multiple incoming audio chunk feeds. You would need to create multiple worker threads for every feed. While a user talks the server needs to do voice activity detection and transcription, which needs VRAM and causes load on GPU. So either you'd need to load balance VAD and transcription somehow or you'd need big amounts of VRAM and GPU power on the server to handle that.