Where's the detail specific document of training data rules?

Hi There,

I tried to craft some simple training like

design de sign, de sign
gender gen der, gen der
bilingual bi lingual, bi lingual
biography bio graphy, bio graphy

for testing list as

design
gender
bilingual
biography

and got the result as

 morfessor -t td1.txt -S model.segm -T text.txt 
Reading corpus from 'td1.txt'...
Detected utf-8 encoding
Done.
Compounds in training data: 16 types / 16 tokens
Starting batch training
Epochs: 0   Cost: 344.6809466060173
.................
Epochs: 1   Cost: 206.03260380373735
.................
Epochs: 2   Cost: 206.0326038037374
Done.
Epochs: 2
Final cost: 206.0326038037374
Training time: 0.017s
Saving segmentations to 'model.segm'...
Done.
Segmenting test data...
Reading corpus from 'text.txt'...
de sign
gen der
bi lingual
bi o graphy
Done.

Done.

Where the expected results is

de sign
gen der
bi lingual
bio graphy

My question is

How can I craft the training data correctly?
Where Can I find the training data specification?

-R Jarod

aalto-speech / morfessor

Where's the detail specific document of training data rules? #20