Thanks for building this @McCloudS! This isn't an actual problem (and would probably be better for a wiki section of this repo if you ever add one). My hope is that this helps future travelers along this journey :)
Language Detection Issues
For anyone else who comes across this project, and is having trouble getting Whisper to detect the proper language of the audio track, keep in mind that stable-ts uses the first 30 seconds of audio to determine the language, which in my case was music which defaulted to English, even though the video was in Japanese.
There are more elegant ways to do this, but if you want to get this working quickly, you can change the subgen.py method call to the model in gen_subtitles() to pass along the proper, hard-coded language code, like this:
result = model.transcribe_stable(file_path, language = "ja", task=transcribe_or_translate_str)
A list of all the language codes can be found at the bottom of subgen.py
A Dockerfile which uses your new, modified subgen.py might look like this:
Running on local files without dealing with Plex/Emby/Jellyfin/etc
The author was kind enough to think about this use case and added an environment variable called TRANSCRIBE_FOLDERS, which just needs a path to files that need to be processed. You don't need to do anything with a media server or even set the PATH_MAPPING variables.
A sample docker-compose.yml that gets this working might look like this:
version: '2'
services:
subgen:
container_name: subgen
tty: true
image: subgencpu:1 #Note this is a local image I built from the Dockerfile mentioned in the issue above.
environment:
- "WHISPER_MODEL=medium"
- "WHISPER_THREADS=4"
- "PROCADDEDMEDIA=True"
- "PROCMEDIAONPLAY=False"
- "NAMESUBLANG=aa"
- "SKIPIFINTERNALSUBLANG=eng"
- "WEBHOOKPORT=8090"
- "CONCURRENT_TRANSCRIPTIONS=2"
- "WORD_LEVEL_HIGHLIGHT=False"
- "USE_PATH_MAPPING=False"
- "TRANSCRIBE_DEVICE=cpu"
- "TRANSCRIBE_FOLDERS=/mnt/media/"
volumes:
- "/home/user/media:/mnt/media/"
ports:
- "8090:8090"
Thanks for building this @McCloudS! This isn't an actual problem (and would probably be better for a wiki section of this repo if you ever add one). My hope is that this helps future travelers along this journey :)
Language Detection Issues
For anyone else who comes across this project, and is having trouble getting Whisper to detect the proper language of the audio track, keep in mind that stable-ts uses the first 30 seconds of audio to determine the language, which in my case was music which defaulted to English, even though the video was in Japanese.
There are more elegant ways to do this, but if you want to get this working quickly, you can change the
subgen.py
method call to the model ingen_subtitles()
to pass along the proper, hard-coded language code, like this:A list of all the language codes can be found at the bottom of
subgen.py
A Dockerfile which uses your new, modified
subgen.py
might look like this:Running on local files without dealing with Plex/Emby/Jellyfin/etc
The author was kind enough to think about this use case and added an environment variable called
TRANSCRIBE_FOLDERS
, which just needs a path to files that need to be processed. You don't need to do anything with a media server or even set thePATH_MAPPING
variables.A sample
docker-compose.yml
that gets this working might look like this: