Closed alecristia closed 6 years ago
This is already done in python but not exposed in bash (see https://github.com/bootphon/wordseg/blob/c77b46cf6926fc732b5f50a39698822fbc5bbe9e/wordseg/algos/ag.py#L212).
I can add an option in the bash command, for instance --generate-grammar or something like that, what do you think ?
sounds terrific!
Now the grammar and category arguments are optional. Use it as cat prep.txt | wordseg-ag --grammar file.lt --category Colloc0
. When not specified a colloc0 grammar is generated automatically. I also updated the tutorial with that new syntax.
The top is always:
1 1 Sentence --> Colloc0s 1 1 Colloc0s --> Colloc0 1 1 Colloc0s --> Colloc0 Colloc0s Colloc0 --> Phonemes 1 1 Phonemes --> Phoneme 1 1 Phonemes --> Phoneme Phonemes
followed by lines like:
1 1 Phoneme --> XX
where XX is a possible unit. To find all possible units, do something like:cat prepared.txt | tr ' ' '\n' | uniq | sort
There will be as many lines as units there are in the prepared corpus.