aalto-speech / morfessor

Morfessor is a tool for unsupervised and semi-supervised morphological segmentation
http://morpho.aalto.fi
BSD 2-Clause "Simplified" License
180 stars 27 forks source link

--output-newlines squeezes multiple newlines #2

Closed flammie closed 7 years ago

flammie commented 10 years ago

Command-line.

flammie@saarkaany ~/Koodit/mt-development/complexity-stats (2145) [01:21:54] 
$ cat > kolme
yksi

kolme
flammie@saarkaany ~/Koodit/mt-development/complexity-stats (2146) [01:22:08] 
$ morfessor -l europarl-v7.fi-en.fi.morfessor --output-format-separator '> <' --output-newlines --output-format '{analysis} ' -T - < kolme
INFO:morfessor.io:Loading model from 'europarl-v7.fi-en.fi.morfessor'...
INFO:morfessor.io:Done.
No training data files specified.
Segmenting test data...
INFO:morfessor.io:Reading corpus from '-'...
yksi 
kolme 
INFO:morfessor.io:Done.

Done.

There should be empty line between yksi and kolme. This is useful for machine translation pipeline where the tools commonly fail when lines don't match.

psmit commented 10 years ago

That looks like a bug indeed. I fixed this now temporarily in https://github.com/phsmit/morfessor/tree/newline_fix . I'll test it later and if it doesn't break anything the fix will be included in the next release.

svirpioj commented 7 years ago

This had been fixed in release 2.0.2.