echogarden-project / echogarden

Easy-to-use speech toolset. Written in TypeScript. Includes tools for synthesis, recognition, alignment, speech translation, language detection, source separation and more.
GNU General Public License v3.0
181 stars 17 forks source link

Small issue regarding error message ISO 639-2 codes and `Echogarden.recognize` #49

Open rbozan opened 5 months ago

rbozan commented 5 months ago

There's a small issue regarding the error message when supplying ISO 639-2 codes to Echogarden.recognize as such:

  const result = await Echogarden.recognize(input, {
    whisper: {
      model: 'small'
    },
    language: 'spa'
  });

This returns

Transcode with command-line ffmpeg.. 5.3ms
Crop using voice activity detection.. 3.5ms
Prepare for recognition.. 0.3ms
Language specified: Spanish (spa)
Load whisper module.. 0.3ms
The language Spanish is not supported by the Whisper engine.

While supplying 'es' works fine

Transcode with command-line ffmpeg.. 5.5ms
Crop using voice activity detection.. 10.2ms
Prepare for recognition.. 2.0ms
Language specified: Spanish (es)
Load whisper module.. 13.5ms
Load tokenizer data.. 72.6ms
Create encoder inference session for model 'small'.. 732.2ms
(--etcetera--)

So I have to supply ISO 639-1 language codes, not ISO 639-2. But the message indicates that Spanish is not supported at all.

rotemdan commented 5 months ago

Thanks,

Two letter language codes are used throughout all synthesis and recognition operations, I believe.

The error message makes it look like the language isn't supported. I should change it to also test if the language format is supported first, though maybe adding support for the three letter codes immediately could be a more thorough solution.

Currently, the error message itself actually does correctly parse spa as Spanish, because of this method:

export function languageCodeToName(languageCode: string) {
    const languageNames = new Intl.DisplayNames(['en'], { type: 'language' })

    let translatedLanguageName: string | undefined

    try {
        translatedLanguageName = languageNames.of(languageCode)
    } catch (e) {
    }

    return translatedLanguageName || 'Unknown'
}

This translation make it look like it understands what the language is, but it's currently only used for the error message itself.

Also, adding support for full language names like french has also been on my task list for a while.

I'll need to find some way to normalize all language codes or names to the two letter ISO 639-1 ones, and their extensions, like pt-br.