Not deleting utterances that have words with digits in them.

clean_utterance() in clean_json.py removes any utterance that contains a token with a digit in it. This behaviour will break training for languages where digits are used as part of the orthography. For instance, the romanization of Chatino uses digits to indicate tone. As a result all utterances from the corpus are removed.

Why was the behaviour there in the first place? Perhaps the idea is that some transcriptions will contain digits for which the pronunciation in the language will be unknown? If that's the case I think the check should be done somewhere else such as where the G2P rules or the pronunciation lexicon are used, since it's possible that there would be pronunciations supplied for digits.

This PR is a work in progress I suppose, because if there was a reason that check for digits was done in the first place then this merge would break something else.

CoEDL / elpis

Not deleting utterances that have words with digits in them. #86