aalto-speech / morfessor

Morfessor is a tool for unsupervised and semi-supervised morphological segmentation
http://morpho.aalto.fi
BSD 2-Clause "Simplified" License
185 stars 29 forks source link

commend line Vs. API #4

Closed kaharjan closed 8 years ago

kaharjan commented 8 years ago

I write python code to segment given words, main code is :

model=io.read_any_model(model.bin')
with open(test.txt,'r') as OutputFile:
                for line in InputFile:
                        words=line.strip().split()
                        morphemes=[(w," ".join(model.viterbi_segment(w)[0])) for w in words]

only few words segmented, but i used the same model on commend line to segment the same text, and most of the words are segmented,
$morfessor-segment -l model.bin test.txt

So any idea what is wrong in my python code? thank you!!!

psmit commented 8 years ago

The most logical thing would be if you are using the wrong encoding. Did you make sure that the InputFile is opened with the right encoding flag? Morfessor uses internally always unicode strings (unicode in python2, str in python3)

kaharjan commented 8 years ago

Thank you!!! It helps...