Helsinki-NLP / OpusTools

66 stars 17 forks source link

Spaces before punctation marks on opus_read output #39

Open keith555 opened 1 year ago

keith555 commented 1 year ago

Apostrophes, commas, question marks, etc, are all printed with a leading space. Is this by design? I couldn't see any options to modify the behaviour.

(src)="8"> She 's calling herself jolene parker .
(trg)="7"> Je ne peux pas te forcer à y croire .
miau1 commented 1 year ago

You are probably using tokenized preprocessing which is the default. You can produce untokenized output with the -p raw option.