aalto-speech / morfessor

Morfessor is a tool for unsupervised and semi-supervised morphological segmentation
http://morpho.aalto.fi
BSD 2-Clause "Simplified" License
180 stars 27 forks source link

Segmented output format #16

Closed valentinmace closed 5 years ago

valentinmace commented 5 years ago

I used morfessor-segment -L en.model test.data > test.morf

It works, however the text in my resulting file test.morf has a word on each line. As I am using corpus with one sentence on each line I would like to have to same output format but I cannot find how to achieve that

Thanks in advance

valentinmace commented 5 years ago

Answer:

morfessor-segment -L en.model -o test.morf --output-format '{analysis} ' test.data --output-newlines

Maybe you want to add a --output-format-separator '@@ ' to cleary define where words are segmented

svirpioj commented 5 years ago

Yes, the output format options are documented at https://morfessor.readthedocs.io/en/latest/cmdtools.html#data-format-command-line-options, but not so clearly. Thanks for providing the answer, too!