Closed ajbouh closed 5 months ago
This API isn't really quite right for the way translators already work.
Today, translators can work at the audio or the text level. Using text-level translation provides a more stable experience when we can support it, but it also creates multiple points of failure. Using audio-level translation creates a different failure mode, where a transcription model might show something different than the translation model shows.
Languages we translate have than one "name" and not every model agrees on what that is. For example, Chinese can be referred to as zh, zho, cmn (Mandarin), and yue (Cantonese). Of these, Whisper uses "zh" and SeamlessM4T uses "cmn". I've implemented some fuzzy matching for this already.
What name would be used to add the translator by voice?
I don't know. Internally we use this bit of logic to find the "3 letter" version of a language name:
https://github.com/ajbouh/substrate/blob/future/services/asr-seamlessm4t/app.py#L121-L145
Maybe the answer is "Anything that pycountry.languages.lookup
can resolve?"
Here's a link to the pycountry package we're using https://pypi.org/project/pycountry/
replaced with #67