common-voice / sentence-collector

Tool to collect and review sentences for Common Voice
https://commonvoice.mozilla.org/sentence-collector/
Mozilla Public License 2.0
81 stars 63 forks source link

Fix to eo.js and tok.js validation scripts #616

Closed janPensa closed 2 years ago

janPensa commented 2 years ago

Somehow [BbCcDdFfGgHhQqRrVvXxYyZz\u00C0-\u02BF\u1E00-\u1EFF\uF1900-\uF19FF] and [qQwWxXyYÀ-ćĊ-ěĞ-ģĞ-ģĦ-ijĶ-śŞ-ūŮ-\u02AF\u1E00-\u1EFFα-ωΑ-ΩЀ-ӿ] match with all regular Latin letters as well, making the Sentence Collector reject all submissions.

Changed to [BbCcDdFfGgHhQqRrVvXxYyZzÀ-ʯḀ-ỿ] and [qQwWxXyYÀ-ćĊ-ěĞ-ģĞ-ģĦ-ijĶ-śŞ-ūŮ-ʯḀ-ỿα-ωΑ-ΩЀ-ӿ], which should work well. (At least they do in Notepad++, which I found has the same behavior.)

MichaelKohler commented 2 years ago

@janPensa tested on https://commonvoice.allizom.org/sentence-collector, looks good to me. Would you agree?

janPensa commented 2 years ago

@MichaelKohler It seems like https://commonvoice.allizom.org/sentence-collector uses the old validation script. It rejects sentences longer than 14 words, and doesn't reject invalid words that don't follow phonotactics.

MichaelKohler commented 2 years ago

Yeah, looks like something is off with that deployment. I'll deploy to production now then.

MichaelKohler commented 2 years ago

:tada: This PR is included in version 2.17.3 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket:

janPensa commented 2 years ago

Okay. I'll wait a bit and test on commonvoice.mozilla.org

janPensa commented 2 years ago

@MichaelKohler I did a few different tests. Looks like everything works as intended now!

MichaelKohler commented 2 years ago

@janPensa should be deployed now :)

MichaelKohler commented 2 years ago

@janPensa hah, you were faster than me. Thanks for the verification and the hotfix!