This will transcribe your personal media on a Plex, Emby, or Jellyfin server to create subtitles (.srt) from audio/video files with the following languages: https://github.com/McCloudS/subgen#audio-languages-supported-via-openai and transcribe or translate them into english. It can also be used as a Whisper provider in Bazarr (See below instructions). It technically has support to transcribe from a foreign langauge to itself (IE Japanese > Japanese, see TRANSCRIBE_OR_TRANSLATE). It is currently reliant on webhooks from Jellyfin, Emby, Plex, or Tautulli. This uses stable-ts and faster-whisper which can use both Nvidia GPUs and CPUs.
Honestly, I built this for me, but saw the utility in other people maybe using it. This works well for my use case. Since having children, I'm either deaf or wanting to have everything quiet. We watch EVERYTHING with subtitles now, and I feel like I can't even understand the show without them. I use Bazarr to auto-download, and gap fill with Plex's built-in capability. This is for everything else. Some shows just won't have subtitles available for some reason or another, or in some cases on my H265 media, they are wildly out of sync.
You can now configure all environment variables via http://subgen:9000/
(fill in your appropriate IP and port). You can still use Docker variables or OS Env Variables if you prefer. A small snapshot of it below:
Install python3 and ffmpeg, and download launcher.py from this repository. and run . Then run it: pip3 install numpy stable-ts fastapi requests faster-whisper uvicorn python-multipart python-ffmpeg whisper transformers optimum accelerate watchdog
python3 launcher.py -u -i -s
. You need to have matching paths relative to your Plex server/folders, or use USE_PATH_MAPPING. Paths are not needed if you are only using Bazarr. You will need the appropriate NVIDIA drivers installed (12.2.0): https://developer.nvidia.com/cuda-12-2-0-download-archive?target_os=Windows&target_arch=x86_64
launcher.py can launch subgen for you and automate the setup and can take the following options:
Using -s
for Bazarr setup:
The dockerfile is in the repo along with an example docker-compose file, and is also posted on dockerhub (mccloud/subgen).
If using Subgen without Bazarr, you MUST mount your media volumes in subgen the same way Plex (or your media server) sees them. For example, if Plex uses "/Share/media/TV:/tv" you must have that identical volume in subgen.
"${APPDATA}/subgen/models:/subgen/models"
is just for storage of the language models. This isn't necessary, but you will have to redownload the models on any new image pulls if you don't use it.
"${APPDATA}/subgen/subgen.py:/subgen/subgen.py"
If you want to control the version of subgen.py by yourself. Launcher.py can still be used to download a newer version.
If you want to use a GPU, you need to map it accordingly.
While Unraid doesn't have an app or template for quick install, with minor manual work, you can install it. See https://github.com/McCloudS/subgen/issues/37 for pictures and steps.
Create a webhook in Plex that will call back to your subgen address, IE: http://192.168.1.111:9000/plex see: https://support.plex.tv/articles/115002267687-webhooks/ You will also need to generate the token to use it. Remember, Plex and Subgen need to be able to see the exact same files at the exact same paths, otherwise you need USE_PATH_MAPPING
.
All you need to do is create a webhook in Emby pointing to your subgen IE: http://192.168.154:9000/emby
, set Request content type
to multipart/form-data
and configure your desired events (Usually, New Media Added
, Start
, and Unpause
). See https://github.com/McCloudS/subgen/discussions/115#discussioncomment-10569277 for screenshot examples.
Emby was really nice and provides good information in their responses, so we don't need to add an API token or server url to query for more information.
Remember, Emby and Subgen need to be able to see the exact same files at the exact same paths, otherwise you need USE_PATH_MAPPING
.
You only need to confiure the Whisper Provider as shown below:
The Docker Endpoint is the ip address and port of your subgen container (IE http://192.168.1.111:9000) See https://wiki.bazarr.media/Additional-Configuration/Whisper-Provider/ for more info. I recomend not enabling this with other webhooks, or you will likely be generating duplicate subtitles. If you are using Bazarr, path mapping isn't necessary, as Bazarr sends the file over http.
Create the webhooks in Tautulli with the following settings: Webhook URL: http://yourdockerip:9000/tautulli Webhook Method: Post Triggers: Whatever you want, but you'll likely want "Playback Start" and "Recently Added" Data: Under Playback Start, JSON Header will be:
{ "source":"Tautulli" }
Data:
{
"event":"played",
"file":"{file}",
"filename":"{filename}",
"mediatype":"{media_type}"
}
Similarly, under Recently Added, Header is:
{ "source":"Tautulli" }
Data:
{
"event":"added",
"file":"{file}",
"filename":"{filename}",
"mediatype":"{media_type}"
}
First, you need to install the Jellyfin webhooks plugin. Then you need to click "Add Generic Destination", name it anything you want, webhook url is your subgen info (IE http://192.168.1.154:9000/jellyfin). Next, check Item Added, Playback Start, and Send All Properties. Last, "Add Request Header" and add the Key: Content-Type
Value: application/json
Click Save and you should be all set!
Remember, Jellyfin and Subgen need to be able to see the exact same files at the exact same paths, otherwise you need USE_PATH_MAPPING
.
You can define the port via environment variables, but the endpoints are static.
The following environment variables are available in Docker. They will default to the values listed below. | Variable | Default Value | Description |
---|---|---|---|
TRANSCRIBE_DEVICE | 'cpu' | Can transcribe via gpu (Cuda only) or cpu. Takes option of "cpu", "gpu", "cuda". | |
WHISPER_MODEL | 'medium' | Can be:'tiny', 'tiny.en', 'base', 'base.en', 'small', 'small.en', 'medium', 'medium.en', 'large-v1','large-v2', 'large-v3', 'large', 'distil-large-v2', 'distil-large-v3', 'distil-medium.en', 'distil-small.en' | |
CONCURRENT_TRANSCRIPTIONS | 2 | Number of files it will transcribe in parallel | |
WHISPER_THREADS | 4 | number of threads to use during computation | |
MODEL_PATH | './models' | This is where the WHISPER_MODEL will be stored. This defaults to placing it where you execute the script in the folder 'models' | |
PROCADDEDMEDIA | True | will gen subtitles for all media added regardless of existing external/embedded subtitles (based off of SKIPIFINTERNALSUBLANG) | |
PROCMEDIAONPLAY | True | will gen subtitles for all played media regardless of existing external/embedded subtitles (based off of SKIPIFINTERNALSUBLANG) | |
NAMESUBLANG | 'aa' | allows you to pick what it will name the subtitle. Instead of using EN, I'm using AA, so it doesn't mix with exiting external EN subs, and AA will populate higher on the list in Plex. | |
SKIPIFINTERNALSUBLANG | 'eng' | Will not generate a subtitle if the file has an internal sub matching the 3 letter code of this variable (See https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) | |
WORD_LEVEL_HIGHLIGHT | False | Highlights each words as it's spoken in the subtitle. See example video @ https://github.com/jianfch/stable-ts | |
PLEXSERVER | 'http://plex:32400' | This needs to be set to your local plex server address/port | |
PLEXTOKEN | 'token here' | This needs to be set to your plex token found by https://support.plex.tv/articles/204059436-finding-an-authentication-token-x-plex-token/ | |
JELLYFINSERVER | 'http://jellyfin:8096' | Set to your Jellyfin server address/port | |
JELLYFINTOKEN | 'token here' | Generate a token inside the Jellyfin interface | |
WEBHOOKPORT | 9000 | Change this if you need a different port for your webhook | |
USE_PATH_MAPPING | False | Similar to sonarr and radarr path mapping, this will attempt to replace paths on file systems that don't have identical paths. Currently only support for one path replacement. Examples below. | |
PATH_MAPPING_FROM | '/tv' | This is the path of my media relative to my Plex server | |
PATH_MAPPING_TO | '/Volumes/TV' | This is the path of that same folder relative to my Mac Mini that will run the script | |
TRANSCRIBE_FOLDERS | '' | Takes a pipe '|' separated list (For example: /tv|/movies|/familyvideos) and iterates through and adds those files to be queued for subtitle generation if they don't have internal subtitles | |
TRANSCRIBE_OR_TRANSLATE | 'transcribe' | Takes either 'transcribe' or 'translate'. Transcribe will transcribe the audio in the same language as the input. Translate will transcribe and translate into English. | |
COMPUTE_TYPE | 'auto' | Set compute-type using the following information: https://github.com/OpenNMT/CTranslate2/blob/master/docs/quantization.md | |
DEBUG | True | Provides some debug data that can be helpful to troubleshoot path mapping and other issues. Fun fact, if this is set to true, any modifications to the script will auto-reload it (if it isn't actively transcoding). Useful to make small tweaks without re-downloading the whole file. | |
FORCE_DETECTED_LANGUAGE_TO | '' | This is to force the model to a language instead of the detected one, takes a 2 letter language code. For example, your audio is French but keeps detecting as English, you would set it to 'fr' | |
CLEAR_VRAM_ON_COMPLETE | True | This will delete the model and do garbage collection when queue is empty. Good if you need to use the VRAM for something else. | |
UPDATE | False | Will pull latest subgen.py from the repository if True. False will use the original subgen.py built into the Docker image. Standalone users can use this with launcher.py to get updates. | |
APPEND | False | Will add the following at the end of a subtitle: "Transcribed by whisperAI with faster-whisper ({whisper_model}) on {datetime.now()}" | |
MONITOR | False | Will monitor TRANSCRIBE_FOLDERS for real-time changes to see if we need to generate subtitles |
|
USE_MODEL_PROMPT | False | When set to True , will use the default prompt stored in greetings_translations "Hello, welcome to my lecture." to try and force the use of punctuation in transcriptions that don't. Automatic CUSTOM_MODEL_PROMPT will only work with ASR, but can still be set manually like so: USE_MODEL_PROMPT=True and CUSTOM_MODEL_PROMPT=Hello, welcome to my lecture. |
|
CUSTOM_MODEL_PROMPT | '' | If USE_MODEL_PROMPT is True , you can override the default prompt (See: https://medium.com/axinc-ai/prompt-engineering-in-whisper-6bb18003562d for great examples). |
|
LRC_FOR_AUDIO_FILES | True | Will generate LRC (instead of SRT) files for filetypes: '.mp3', '.flac', '.wav', '.alac', '.ape', '.ogg', '.wma', '.m4a', '.m4b', '.aac', '.aiff' | |
CUSTOM_REGROUP | 'cm_sl=84_sl=42++++++1' | Attempts to regroup some of the segments to make a cleaner looking subtitle. See https://github.com/McCloudS/subgen/issues/68 for discussion. Set to blank if you want to use Stable-TS default regroups algorithm of cm_sp=,* /,_sg=.5_mg=.3+3_sp=.* /。/?/? |
|
DETECT_LANGUAGE_LENGTH | 30 | Detect language on the first x seconds of the audio. | |
SKIPIFEXTERNALSUB | False | Skip subtitle generation if an external subtitle with the same language code as NAMESUBLANG is present. Used for the case of not regenerating subtitles if I already have Movie (2002).NAMESUBLANG.srt from a non-subgen source. |
|
SUBGEN_KWARGS | '{}' | Takes a kwargs python dictionary of options you would like to add/override. For advanced users. An example would be {'vad': 'True','prompt_reset_on_temperature': '0.35'} |
mccloud/subgen:latest
is GPU or CPU
mccloud/subgen:cpu
is for CPU only (slightly smaller image)
Fix documentation and make it prettier!
Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.