First off, thanks for this great tool @McCloudS ! Came across this project over the weekend, and it was super easy to wire into my setup at home. I'm using the bazarr + plex + tautulli bits for integrations.
This PR is a patch for the /detect-language POST endpoint's response.
It prevents Bazarr from throttling whisperai (subgen) provider for 10 minutes with LanguageReverseError
CHANGED:
"language_code" -> two letter language code (changed)
UNCHANGED:
"detected_language" -> full language name, e.g. "english" or "nynorsk" (unchanged)
Background:
I have a huge queue of items in bazarr, and I've been using subgen to generate all of the subtitles over the past few days. in Bazarr, I kept getting intermittent LanguageReverseErrors when using subgen as the whisperai provider. Every time it hit that, Bazarr would throttle the whisperai provider for 10 minutes.
Since whisperai is my only real provider in use, all the remaining items in the queue would blow through and the search task would complete as fast as bazarr could read the queue and say it finished processing the item (with no providers left available, the search for the queued item happens, returning 0 results without error).
Each night the task would run it would only process a few to a few dozen items before crashing out. With a "Wanted" list of > 18000 items, the past few days hadn't made much of dent at all in that list.
I initially thought it was something to do with the nynorsk language in bazarr since that was the language in the log that preceded every LanguageReverseError stack trace in the bazarr logs. Eventually I saw a couple others (malay, khmer) as well, and that got me looking into it.
For many languages with simple names, that worked: "english", "french", "german", etc...
For languages that didn't map exactly, the final Language.fromname(name) would raise the LanguageReverseError that throttle the provider for 10 min
I patched my local Bazarr yesterday with a workaround to do this reverse lookup and that fixed the throttling issue to get my queue processing all day yesterday. This PR should fix the root of the issue I've been experiencing without needing that workaround in bazarr.
First off, thanks for this great tool @McCloudS ! Came across this project over the weekend, and it was super easy to wire into my setup at home. I'm using the bazarr + plex + tautulli bits for integrations.
This PR is a patch for the
/detect-language
POST endpoint's response.It prevents Bazarr from throttling whisperai (subgen) provider for 10 minutes with
LanguageReverseError
CHANGED:
UNCHANGED:
Background: I have a huge queue of items in bazarr, and I've been using subgen to generate all of the subtitles over the past few days. in Bazarr, I kept getting intermittent
LanguageReverseError
s when using subgen as the whisperai provider. Every time it hit that, Bazarr would throttle thewhisperai
provider for 10 minutes.Since
whisperai
is my only real provider in use, all the remaining items in the queue would blow through and the search task would complete as fast as bazarr could read the queue and say it finished processing the item (with no providers left available, the search for the queued item happens, returning 0 results without error).Each night the task would run it would only process a few to a few dozen items before crashing out. With a "Wanted" list of > 18000 items, the past few days hadn't made much of dent at all in that list.
I initially thought it was something to do with the
nynorsk
language in bazarr since that was the language in the log that preceded everyLanguageReverseError
stack trace in the bazarr logs. Eventually I saw a couple others (malay, khmer) as well, and that got me looking into it.Here's what I traced:
Bazarr uses both the
"language_code"
and the"detected_language"
from the response: https://github.com/morpheus65535/bazarr/blob/5429749e72bcbcd960e63704bfac522bd87cc244/libs/subliminal_patch/providers/whisperai.py#L259C1-L259C94It first tries to match the language from the
"language_code"
, expecting a two-letter code: https://github.com/morpheus65535/bazarr/blob/5429749e72bcbcd960e63704bfac522bd87cc244/libs/subliminal_patch/providers/whisperai.py#L160-L161When that fails, it tries to match the language by name: https://github.com/morpheus65535/bazarr/blob/5429749e72bcbcd960e63704bfac522bd87cc244/libs/subliminal_patch/providers/whisperai.py#L162-L163
For many languages with simple names, that worked:
"english"
,"french"
,"german"
, etc... For languages that didn't map exactly, the finalLanguage.fromname(name)
would raise theLanguageReverseError
that throttle the provider for 10 minFor the
"nynorsk"
example, the name that it was trying to match was not "nynorsk" but instead something like"Norwegian Nynorsk"
: https://github.com/morpheus65535/bazarr/blob/5429749e72bcbcd960e63704bfac522bd87cc244/libs/babelfish/data/iso-639-3.tab#L4748I patched my local Bazarr yesterday with a workaround to do this reverse lookup and that fixed the throttling issue to get my queue processing all day yesterday. This PR should fix the root of the issue I've been experiencing without needing that workaround in bazarr.