facebookresearch / SimulEval

SimulEval: A General Evaluation Toolkit for Simultaneous Translation
Creative Commons Attribution Share Alike 4.0 International
102 stars 36 forks source link

Feat: Demo feature for remote streaming speech to text #111

Closed Epic-Eric closed 1 month ago

Epic-Eric commented 2 months ago

Description

A streaming speech to text demo feature, taking input from user's microphone, sending it to Whisper's wait-k model, and displaying the prediction texts in the terminal.

Related issue: #54

How to start STT streaming

1. Build and run the docker container First change into the directory containing the Dockerfile:

cd examples/speech_to_text

Then, build the Docker image with:

docker build -t simuleval-speech-to-text:1.0 .

Next, run the remote evaluation server using the Docker image:

docker run -p 8888:8888 simuleval-speech-to-text:1.0

This binds port 8888 of the container (server) to port 8888 on the local machine (client).

OR

1. Kick off a standalone whisper agent for remote translation:

cd examples/speech_to_text
simuleval --standalone --remote-port 8888 --agent whisper_waitk.py --waitk-lagging 3 

2. Enter demo mode by providing a desired segment size (usually 500ms):

simuleval --remote-eval --demo --source-segment-size 500 --remote-port 8888

3. Speak into the microphone and watch the live transcription!

Screenshot 2024-09-04 at 9 11 13 AM

4. Press ^c (Control C) to exit the program in terminal

Type of change

How Has This Been Tested?

Tested locally.

Test Configuration: