Preprocessing Details & Vaidation (Recall) Results from AVSpeech Subset

RicherMans / SpokenLanguageClassifiers

Pretrained spoken language classifiers from audio.

MIT License

8 stars 2 forks source link

Preprocessing Details & Vaidation (Recall) Results from AVSpeech Subset #2

Open MarkuzK opened 3 years ago

MarkuzK commented 3 years ago

Hi, great effort ... I was exploring something very similar and in the process found your repo. I've done a quick validation using a small AVSpeech subset (2,930 manually validated English-only videos, single speaker) and got a 66.9% recall with CNN10 (1,961 of the videos classified correctly as English; misses on the remainder).

Did you do any pre-processing of the audio data prior to training your models?

Cheers,

Markus

RicherMans commented 3 years ago

Hey MarkuzK, I am actually not very familiar with the common metrics and common performance for this task ^^. Thus I don't know if 66.9% recall a reasonable performance? I actually just ran these models similar to my sound event detection setups, so I didn't do any significant preprocessing to the audios. I guess there can be a lot done to enhance the model's performance, but I trained them for a "any case" scenario I might one day need a language detector :).