ASR Worker that uses faster-whisper as the backend, to be used for transcribing AV material from B&G.
This is still a WIP, so it is subject to change.
There are 2 ways in which the whisper-asr-worker can be tested (ON THE CPU):
.env.override
file in your local repo folder.env.override
, change W_DEVICE
from cuda
to cpu
docker-compose.yml
docker build . -t whisper-asr-worker
docker compose up
All commands should be run within WSL if on Windows or within your terminal if on Linux.
pyproject.toml
and generating a poetry.lock
based on it") to install Poetry and the dependencies required to run the worker.env.override
file in your local repo folder.env.override
, change W_DEVICE
from cuda
to cpu
ffmpeg
. You can run this command, for example:
apt-get -y update && apt-get -y upgrade && apt-get install -y --no-install-recommends ffmpeg
scripts
, then execute the following command:
./run.sh
To run the worker with a CUDA-compatible GPU instead of the CPU, either:
(OUTDATED BUT STILL MIGHT BE RELEVANT) To run it using a GPU via Docker, check the instructions from the dane-example-worker.
Make sure to replace dane-example-worker
in the docker run
command with dane-whisper-asr-worker
.
The expected run of this worker (whose pipeline is defined in asr.py
) should
download the input file if it isn't downloaded already in /data/input/
via download.py
download the model if not present via model_download.py
run transcode.py
if the input file is a video to convert it to audio format (though there are plans to remove this and instead use the audio-extraction-worker to extract the audio)
run whisper.py
to transcribe the audio and save it in /data/output/
if a transcription doesn't already exist
convert Whisper's output to DAAN index format using daan_transcript.py
(optional) transfer the output to an S3 bucket.
If you prefer to use your own model that is stored locally, make sure to set MODEL_BASE_DIR
to the path where the model files can be found.
The pre-trained Whisper model version can be adjusted in the .env
file by editing the W_MODEL
parameter. Possible options are:
Size | Parameters |
---|---|
tiny |
39 M |
base |
74 M |
small |
244 M |
medium |
769 M |
large |
1550 M |
large-v2 |
1550 M |
large-v3 |
1550 M |
We recommend version large-v2
as it performs better than large-v3
in our benchmarks.
You can also specify an S3/HTTP URI if you want to load your own (custom) model (by modifying the W_MODEL
parameter).
The parameters used to configure the application can be found under .env
file. You will also need to create a .env.override
file that contains secrets related to the S3 connection that should normally not be exposed in the .env
file. The parameters that should be updated with valid values in the .env.override
are:
S3_ENDPOINT_URL
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY