Caucasus-Rosetta / Lingua-Corpus

Caucasus languages focused multilingual and monolingual corpuses for Natural Language Processing(NLP)
Apache License 2.0
33 stars 6 forks source link

[Common Voice] averageClipDuration is not accurate #100

Closed danielinux7 closed 2 years ago

danielinux7 commented 2 years ago

Ахцәажәара

The homecard on the website, is a rough estimate of totalClips * averageClipDuration. In the specific case for Abkhazian, the average clip duration went down over releases from 6.41s to 5.127s which greatly affected the total hours calculation.

Ауадаҩрақәа

We use these numbers in our campaign, also we monitor events to see how far we got, the numbers shouldn’t be way off. I posted about this on CV discourse

Аӡбарақәа

I think I will open an issue for this in Common Voice github, averageClipDuration should be updated more often to give realistic numbers.