Open nefastosaturo opened 3 years ago
Controllati nannioz e Stefano: aggiorno l'issue e qui sotto.
nannioz-20091103-qfc - ok nannioz-20091103-raj - ok nannioz-20091103-vkr - ok nannioz-20091103-zhz - ok Stefano-20150131-pus - ok
ok so until we do not find other strange samples, we are done here. I'm leaving the issue open for future checks
anonymous-20080725-dey - NO - EMPTY AUDIO anonymous-20110605-kpd - OK - low-volume but understandable audio anonymous-20170303-mwy - OK - low-volume but understandable audio dario-20110426-yhj - OK Karm-20131225-irq - OK
EDIT:
So, with some audio analysis we found some ugly speakers but for all the other speakers a manual check is needed.
If you want to help, please:
A valid audio must contain speech, even with very low volume and must be understandable. For example
Vistaus-20080718-mrm
is not a valid oneDONE!
I've found some bad samples in this dataset. So I've just search for audio files with an average RMS below 0.025 value and I found these speakers that need to be checked:
Also there is one speaker that is not italian and I'll remove it:
Vistaus-20080718-mrm
So, I'm asking you if you can choose two speakers, listen to their recordings and report if there is something VERY wrong (eg we can keep very-low volume but understandable recordings ).
You'll find all the recordings here http://www.repository.voxforge1.org/downloads/it/Trunk/Audio/Main/16kHz_16bit/
A csv containing all the samples with their RMS is attached voxforge_bad_samples.zip