Open MarkuzK opened 3 years ago
Hey MarkuzK, I am actually not very familiar with the common metrics and common performance for this task ^^. Thus I don't know if 66.9% recall a reasonable performance? I actually just ran these models similar to my sound event detection setups, so I didn't do any significant preprocessing to the audios. I guess there can be a lot done to enhance the model's performance, but I trained them for a "any case" scenario I might one day need a language detector :).
Hi, great effort ... I was exploring something very similar and in the process found your repo. I've done a quick validation using a small AVSpeech subset (2,930 manually validated English-only videos, single speaker) and got a 66.9% recall with CNN10 (1,961 of the videos classified correctly as English; misses on the remainder).
Did you do any pre-processing of the audio data prior to training your models?
Cheers,
Markus