So, we can pre-process Wiki sentences with replacements. One common usage is expanding abbreviations and acronyms. Of course, these are required for them to be useful in Common Voice.
In my own pre-processing scripts (for language models for example), I convert numbers to texts, run spellchecking and replace common misspellings etc.
On the other hand, these replacements change the original text. Where should we stop - legally?
So, we can pre-process Wiki sentences with replacements. One common usage is expanding abbreviations and acronyms. Of course, these are required for them to be useful in Common Voice.
In my own pre-processing scripts (for language models for example), I convert numbers to texts, run spellchecking and replace common misspellings etc.
On the other hand, these replacements change the original text. Where should we stop - legally?