bootphon / wordseg

A Python toolbox for text based word segmentation
https://docs.cognitive-ml.fr/wordseg
GNU General Public License v3.0
16 stars 7 forks source link

Syllabification #3 - schwa equivalent #38

Closed GladB closed 6 years ago

GladB commented 6 years ago

About clitics, in many languages there is a silent "filling" vowel (schwa '/ə/' in French) that could be added to groups of consonants with no vowel (such as clitics) so that they do not raise errors anymore; this could be an option or parameter in the command line. For French, that could be :

wordseg-syll --silent ə input_file.txt onsets_file.txt vowels_file.txt

mmmaat commented 6 years ago

I implemented a kind of that but I'm not sure it is correct : for all words with no vowel, I append the "silent filling vowel", syllabify the word, and remove that vowel at the end (so that the text is not modified). The result is a syllable boundary at the end of the word (if the onset is valid, else it fails).

With that implementation with don't need to specify a silent vowel (it is chosen by the program among the symbols not in vowels or onsets). For now I have the --silent option to activate that behavior but we can remove it and do it by default, what do you think?

mmmaat commented 6 years ago

This is in commit https://github.com/bootphon/wordseg/commit/0cbce8ca7bc5d52733d089f2f0150fa835127648 if you want have a look.