Small issue regarding error message ISO 639-2 codes and `Echogarden.recognize`

echogarden-project / echogarden

Easy-to-use speech toolset. Written in TypeScript. Includes tools for synthesis, recognition, alignment, speech translation, language detection, source separation and more.

GNU General Public License v3.0

181 stars 17 forks source link

Transcode with command-line ffmpeg.. 5.3ms Crop using voice activity detection.. 3.5ms Prepare for recognition.. 0.3ms Language specified: Spanish (spa) Load whisper module.. 0.3ms The language Spanish is not supported by the Whisper engine.

Transcode with command-line ffmpeg.. 5.5ms Crop using voice activity detection.. 10.2ms Prepare for recognition.. 2.0ms Language specified: Spanish (es) Load whisper module.. 13.5ms Load tokenizer data.. 72.6ms Create encoder inference session for model 'small'.. 732.2ms (--etcetera--)

Thanks,

Two letter language codes are used throughout all synthesis and recognition operations, I believe.

The error message makes it look like the language isn't supported. I should change it to also test if the language format is supported first, though maybe adding support for the three letter codes immediately could be a more thorough solution.

Currently, the error message itself actually does correctly parse spa as Spanish, because of this method:

export function languageCodeToName(languageCode: string) {
    const languageNames = new Intl.DisplayNames(['en'], { type: 'language' })

    let translatedLanguageName: string | undefined

    try {
        translatedLanguageName = languageNames.of(languageCode)
    } catch (e) {
    }

    return translatedLanguageName || 'Unknown'
}

This translation make it look like it understands what the language is, but it's currently only used for the error message itself.

Also, adding support for full language names like french has also been on my task list for a while.

I'll need to find some way to normalize all language codes or names to the two letter ISO 639-1 ones, and their extensions, like pt-br.

echogarden-project / echogarden

Small issue regarding error message ISO 639-2 codes and `Echogarden.recognize` #49