Subtitles generation tool (Web-UI + CLI + Python package) powered by OpenAI's Whisper and its variants
Supported Models
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model
- Plain C/C++ implementation without dependencies
- Runs on the CPU
faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2, which is a fast inference engine for Transformer models.
This implementation is up to 4 times faster than openai/whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.
fast automatic speech recognition (70x realtime with large-v2) with word-level timestamps and speaker diarization.
- ā”ļø Batched inference for 70x realtime transcription using whisper large-v2
- šŖ¶ faster-whisper backend, requires <8GB gpu memory for large-v2 with beam_size=5
- šÆ Accurate word-level timestamps using wav2vec2 alignment
- šÆāāļø Multispeaker ASR using speaker diarization from pyannote-audio (speaker ID labels)
- š£ļø VAD preprocessing, reduces hallucination & batching with no WER degradation.
Stabilizing Timestamps for Whisper: This library modifies Whisper to produce more reliable timestamps and extends its functionality.
Hugging Face implementation of Whisper. Any speech recognition pretrained model from the Hugging Face hub can be used as well.
OpenAI Whisper via their API
Web UI
Command Line Interface
Python package
Supports different subtitle formats thanks to tkarabela/pysubs2
Supports audio and video files
Quoted from the official openai/whisper installation
It requires the command-line tool
ffmpeg
to be installed on your system, which is available from most package managers:# on Ubuntu or Debian sudo apt update && sudo apt install ffmpeg # on Arch Linux sudo pacman -S ffmpeg # on MacOS using Homebrew (https://brew.sh/) brew install ffmpeg # on Windows using Chocolatey (https://chocolatey.org/) choco install ffmpeg # on Windows using Scoop (https://scoop.sh/) scoop install ffmpeg
You may need
rust
installed as well, in case tokenizers does not provide a pre-built wheel for your platform. If you see installation errors during thepip install
command above, please follow the Getting started page to install Rust development environment. Additionally, you may need to configure thePATH
environment variable, e.g.export PATH="$HOME/.cargo/bin:$PATH"
. If the installation fails withNo module named 'setuptools_rust'
, you need to installsetuptools_rust
, e.g. by running:pip install setuptools-rust
subsai
pip install git+https://github.com/abdeladim-s/subsai
To use the web-UI, run the following command on the terminal
subsai-webui
And a web page will open on your default browser, otherwise navigate to the links provided by the command
You can also run the Web-UI using Docker.
usage: subsai [-h] [--version] [-m MODEL] [-mc MODEL_CONFIGS] [-f FORMAT] [-df DESTINATION_FOLDER] [-tm TRANSLATION_MODEL]
[-tc TRANSLATION_CONFIGS] [-tsl TRANSLATION_SOURCE_LANG] [-ttl TRANSLATION_TARGET_LANG]
media_file [media_file ...]
positional arguments:
media_file The path of the media file, a list of files, or a text file containing paths for batch processing.
options:
-h, --help show this help message and exit
--version show program's version number and exit
-m MODEL, --model MODEL
The transcription AI models. Available models: ['openai/whisper', 'linto-ai/whisper-timestamped']
-mc MODEL_CONFIGS, --model-configs MODEL_CONFIGS
JSON configuration (path to a json file or a direct string)
-f FORMAT, --format FORMAT, --subtitles-format FORMAT
Output subtitles format, available formats ['.srt', '.ass', '.ssa', '.sub', '.json', '.txt', '.vtt']
-df DESTINATION_FOLDER, --destination-folder DESTINATION_FOLDER
The directory where the subtitles will be stored, default to the same folder where the media file(s) is stored.
-tm TRANSLATION_MODEL, --translation-model TRANSLATION_MODEL
Translate subtitles using AI models, available models: ['facebook/m2m100_418M', 'facebook/m2m100_1.2B',
'facebook/mbart-large-50-many-to-many-mmt']
-tc TRANSLATION_CONFIGS, --translation-configs TRANSLATION_CONFIGS
JSON configuration (path to a json file or a direct string)
-tsl TRANSLATION_SOURCE_LANG, --translation-source-lang TRANSLATION_SOURCE_LANG
Source language of the subtitles
-ttl TRANSLATION_TARGET_LANG, --translation-target-lang TRANSLATION_TARGET_LANG
Target language of the subtitles
Example of a simple usage
subsai ./assets/test1.mp4 --model openai/whisper --model-configs '{"model_type": "small"}' --format srt
Note: For Windows CMD, You will need to use the following :
subsai ./assets/test1.mp4 --model openai/whisper --model-configs "{\"model_type\": \"small\"}" --format srt
You can also provide a simple text file for batch processing (Every line should contain the absolute path to a single media file)
subsai media.txt --model openai/whisper --format srt
from subsai import SubsAI
file = './assets/test1.mp4'
subs_ai = SubsAI()
model = subs_ai.create_model('openai/whisper', {'model_type': 'base'})
subs = subs_ai.transcribe(file, model)
subs.save('test1.srt')
For more advanced usage, read the documentation.
Simple examples can be found in the examples folder
VAD example: process long audio files using silero-vad.
Translation example: translate an already existing subtitles file.
Make sure that you have docker
installed.
Prebuilt image
docker pull absadiki/subsai:main
docker run --gpus=all -p 8501:8501 -v /path/to/your/media_files/folder:/media_files absadiki/subsai:main
Build the image locally
cd
to the repositorydocker compose build
docker compose run -p 8501:8501 -v /path/to/your/media_files/folder:/media_files subsai-webui # subsai-webui-cpu for cpu only
You can access your media files through the mounted media_files
folder.
settings > Theme > Light
.If you find a bug, have a suggestion or feedback, please open an issue for discussion.
This project is licensed under the GNU General Licence version 3 or later. You can modify or redistribute it under the conditions of these licences (See LICENSE for more information).