bootphon / wordseg

A Python toolbox for text based word segmentation
https://docs.cognitive-ml.fr/wordseg
GNU General Public License v3.0
16 stars 7 forks source link

Syllabification #36

Closed GladB closed 6 years ago

GladB commented 6 years ago

At each inconsistency, the code fails with error

fatal error: line 174: syllabified utterance differs from the input one, the onsets and/or vowels may be invalid : [syllabified] != [original]

Perhaps it would be better to have a log file, in which these errors are reported, and keep syllabifying the rest while discarding problematic utterances. This would allow for undesirable utterances to be easily discarded, while continuing the syllabification of the rest of the text, and the number of errors can be displayed at the end of the computation. The errors usually affect one word of the utterance; the format of the log could be, for each error :

line line_nb : word word_nb : "syllabified version" : "original version"

for example

line 1 : word 2 : "ɐn ʃuː" : "ɐn ɪʃuː"

mmmaat commented 6 years ago

I added the --tolerant option that is doing what you want.