Wordcab / wordcab-transcribe

💬 ASR FastAPI server using faster-whisper and Multi-Scale Auto-Tuning Spectral Clustering for diarization.
https://wordcab.github.io/wordcab-transcribe/
MIT License
198 stars 29 forks source link

Wordcab Transcribe

💬 Speech recognition is now a commodity


FastAPI based API for transcribing audio files using faster-whisper and Auto-Tuning-Spectral-Clustering for diarization (based on this GitHub implementation).

[!IMPORTANT]\ If you want to see the great performance of Wordcab-Transcribe compared to all the available ASR tools on the market, please check out our benchmark project: Rate that ASR.

Key features

Requirements

Local development

Run the API locally 🚀

hatch run runtime:launch

Deployment

Run the API using Docker

Build the image.

docker build -t wordcab-transcribe:latest .

Run the container.

docker run -d --name wordcab-transcribe \
    --gpus all \
    --shm-size 1g \
    --restart unless-stopped \
    -p 5001:5001 \
    -v ~/.cache:/root/.cache \
    wordcab-transcribe:latest

You can mount a volume to the container to load local whisper models.

If you mount a volume, you need to update the WHISPER_MODEL environment variable in the .env file.

docker run -d --name wordcab-transcribe \
    --gpus all \
    --shm-size 1g \
    --restart unless-stopped \
    -p 5001:5001 \
    -v ~/.cache:/root/.cache \
    -v /path/to/whisper/models:/app/whisper/models \
    wordcab-transcribe:latest

You can simply enter the container using the following command:

docker exec -it wordcab-transcribe /bin/bash

This is useful to check everything is working as expected.

Run the API behind a reverse proxy

You can run the API behind a reverse proxy like Nginx. We have included a nginx.conf file to help you get started.

# Create a docker network and connect the api container to it
docker network create transcribe
docker network connect transcribe wordcab-transcribe

# Replace /absolute/path/to/nginx.conf with the absolute path to the nginx.conf
# file on your machine (e.g. /home/user/wordcab-transcribe/nginx.conf).
docker run -d \
    --name nginx \
    --network transcribe \
    -p 80:80 \
    -v /absolute/path/to/nginx.conf:/etc/nginx/nginx.conf:ro \
    nginx

# Check everything is working as expected
docker logs nginx

⏱ Profile the API You can profile the process executions using `py-spy` as a profiler. ```bash # Launch the container with the cap-add=SYS_PTRACE option docker run -d --name wordcab-transcribe \ --gpus all \ --shm-size 1g \ --restart unless-stopped \ --cap-add=SYS_PTRACE \ -p 5001:5001 \ -v ~/.cache:/root/.cache \ wordcab-transcribe:latest # Enter the container docker exec -it wordcab-transcribe /bin/bash # Install py-spy pip install py-spy # Find the PID of the process to profile top # 28 for example # Run the profiler py-spy record --pid 28 --format speedscope -o profile.speedscope.json # Launch any task on the API to generate some profiling data # Exit the container and copy the generated file to your local machine exit docker cp wordcab-transcribe:/app/profile.speedscope.json profile.speedscope.json # Go to https://www.speedscope.app/ and upload the file to visualize the profile ```

Test the API

Once the container is running, you can test the API.

The API documentation is available at http://localhost:5001/docs.

import json
import requests

filepath = "/path/to/audio/file.wav"  # or any other convertible format by ffmpeg
data = {
  "num_speakers": -1,  # # Leave at -1 to guess the number of speakers
  "diarization": True,  # Longer processing time but speaker segment attribution
  "multi_channel": False,  # Only for stereo audio files with one speaker per channel
  "source_lang": "en",  # optional, default is "en"
  "timestamps": "s",  # optional, default is "s". Can be "s", "ms" or "hms".
  "word_timestamps": False,  # optional, default is False
}

with open(filepath, "rb") as f:
    files = {"file": f}
    response = requests.post(
        "http://localhost:5001/api/v1/audio",
        files=files,
        data=data,
    )

r_json = response.json()

filename = filepath.split(".")[0]
with open(f"{filename}.json", "w", encoding="utf-8") as f:
  json.dump(r_json, f, indent=4, ensure_ascii=False)
import json
import requests

headers = {"accept": "application/json", "Content-Type": "application/json"}
params = {"url": "https://youtu.be/JZ696sbfPHs"}
data = {
  "diarization": True,  # Longer processing time but speaker segment attribution
  "source_lang": "en",  # optional, default is "en"
  "timestamps": "s",  # optional, default is "s". Can be "s", "ms" or "hms".
  "word_timestamps": False,  # optional, default is False
}

response = requests.post(
  "http://localhost:5001/api/v1/youtube",
  headers=headers,
  params=params,
  data=json.dumps(data),
)

r_json = response.json()

with open("youtube_video_output.json", "w", encoding="utf-8") as f:
  json.dump(r_json, f, indent=4, ensure_ascii=False)

Running Local Models

You can link a local folder path to use a custom model. If you do so, you should mount the folder in the docker run command as a volume, or include the model directory in your Dockerfile to bake it into the image.

Note that for the default tensorrt-llm whisper engine, the simplest way to get a converted model is to use hatch to start the server locally once. Specify the WHISPER_MODEL and ALIGN_MODEL in .env, then run hatch run runtime:launch in your terminal. This will download and convert these models.

You'll then find the converted models in cloned_wordcab_transcribe_repo/src/wordcab_transcribe/whisper_models. Then in your Dockerfile, copy the converted models to the /app/src/wordcab_transcribe/whisper_models directory.

Example Dockerfile line for WHISPER_MODEL: COPY cloned_wordcab_transcribe_repo/src/wordcab_transcribe/whisper_models/large-v3 /app/src/wordcab_transcribe/whisper_models/large-v3 Example Dockerfile line for ALIGN_MODEL: COPY cloned_wordcab_transcribe_repo/src/wordcab_transcribe/whisper_models/tiny /app/src/wordcab_transcribe/whisper_models/tiny

🚀 Contributing

Getting started

  1. Ensure you have the Hatch installed (with pipx for example):
  1. Clone the repo
git clone
cd wordcab-transcribe
  1. Install dependencies and start coding
hatch env create
  1. Run tests
# Quality checks without modifying the code
hatch run quality:check

# Quality checks and auto-formatting
hatch run quality:format

# Run tests with coverage
hatch run tests:run

Working workflow

  1. Create an issue for the feature or bug you want to work on.
  2. Create a branch using the left panel on GitHub.
  3. git fetchand git checkout the branch.
  4. Make changes and commit.
  5. Push the branch to GitHub.
  6. Create a pull request and ask for review.
  7. Merge the pull request when it's approved and CI passes.
  8. Delete the branch.
  9. Update your local repo with git fetch and git pull.