Description

A streaming speech to text demo feature, taking input from user's microphone, sending it to Whisper's wait-k model, and displaying the prediction texts in the terminal.

Related issue: #54

How to start STT streaming

1. Build and run the docker container First change into the directory containing the Dockerfile:

cd examples/speech_to_text

Then, build the Docker image with:

docker build -t simuleval-speech-to-text:1.0 .

Next, run the remote evaluation server using the Docker image:

docker run -p 8888:8888 simuleval-speech-to-text:1.0

This binds port 8888 of the container (server) to port 8888 on the local machine (client).

1. Kick off a standalone whisper agent for remote translation:

cd examples/speech_to_text

simuleval --standalone --remote-port 8888 --agent whisper_waitk.py --waitk-lagging 3

2. Enter demo mode by providing a desired segment size (usually 500ms):

simuleval --remote-eval --demo --source-segment-size 500 --remote-port 8888

3. Speak into the microphone and watch the live transcription!

4. Press ^c (Control C) to exit the program in terminal

Type of change

[x] New feature (non-breaking change which adds functionality)
[x] This change requires a documentation update

How Has This Been Tested?

Tested locally.

Test Configuration:

Firmware version: Sonoma 14.6.1
Hardware: Macbook M3 Pro
Toolchain: Python, Pyaudio, Silero-VAD
SDK: VS Code

facebookresearch / SimulEval

Feat: Demo feature for remote streaming speech to text #111

Description

How to start STT streaming

Type of change

How Has This Been Tested?