Closed bact closed 3 years ago
As said in Matrix, reflecting here as well: https://matrix.to/#/!WpvTgzNfcLIgzJZdSE:mozilla.org/$Ihtxvr7o0nTdSBKnlFr8k1OJawBWK14PVjJC7z1CKsg?via=mozilla.org&via=matrix.org&via=tchncs.de
Looks like the minimum recording duration is 1500ms, last changed here: https://github.com/common-voice/common-voice/commit/f00f8f423ca1c03bae2a8931f492cb45584f64dd#diff-b0fc0339b967e566feaf9f768dfcc72a5f9fe47fc345a315f590088726033a35R49 with 500ms for the benchmark sentences. I would suggest that we figure out first if that generally could be decreased before only allowing longer sentences. Maybe jz (phire) remembers why it's set at that value.
Thanks Michael. We will discuss among Thai contributors on which minimum length of chars will work.
Note, current MIN_RECORDING_MS
is now 1000 (1 sec).
https://github.com/common-voice/common-voice/blob/1d6a861a234e5cd8cd075031b95095ba0ed9428b/web/src/components/pages/contribution/speak/speak.tsx#L50
As the new minimum time is 1 sec and looks like we longer have a recording problem with that. I will close this issue - for now.
Common Voice has a constrain to not accepting too short recording
To work around this, a volunteer may stretch their pronunciation (as reported by @veer66).
This practice may affect the naturalness of the speech
To avoid the issue that may occur from stretching the pronunciation, we propose to increase
MIN_LENGTH
(currently 2 chars) https://github.com/common-voice/sentence-collector/blob/2a4117ea2428da2f43fb644bd973ea3ac9ad486a/server/lib/validation/languages/th.js#L24From informal tests with three volunteers, we read about 8-12 characters in 1 second, in our normal speed
So we may propose to increase
MIN_LENGTH
to a number in that range. Assume that the minimum duration is 1 second.