aalto-speech / morfessor

Morfessor is a tool for unsupervised and semi-supervised morphological segmentation
http://morpho.aalto.fi
BSD 2-Clause "Simplified" License
180 stars 27 forks source link

The `--atom-separator` option doesn't work on Python 3 #8

Closed rspeer closed 7 years ago

rspeer commented 7 years ago

vocab-vi.txt is a list of Vietnamese terms, with syllables separated by _. I tried using Morfessor to group the syllables into words:

morfessor -t vocab-vi.txt -T vocab-vi.txt -x lexicon-vi.txt -S lexicon-vi.morf --traindata-list --atom-separator '_'

and I got this error, from code that apparently hasn't been ported to Python 3:

Traceback (most recent call last):
  File "/home/rspeer/.virtualenvs/lum/bin/morfessor", line 22, in <module>
    main(sys.argv[1:])
  File "/home/rspeer/.virtualenvs/lum/bin/morfessor", line 13, in main
    morfessor.main(args)
  File "/home/rspeer/.virtualenvs/lum/lib/python3.5/site-packages/morfessor/cmd.py", line 393, in main
    args.finish_threshold, args.maxepochs)
  File "/home/rspeer/.virtualenvs/lum/lib/python3.5/site-packages/morfessor/baseline.py", line 572, in train_batch
    (w, _constructions_to_str(segments)))
  File "/home/rspeer/.virtualenvs/lum/lib/python3.5/site-packages/morfessor/baseline.py", line 17, in _constructions_to_str
    isinstance(constructions[0], unicode)):
NameError: name 'unicode' is not defined

If I try replacing that check with just a check for str, it also doesn't solve the problem, it just uncovers another one:

Traceback (most recent call last):
  File "/home/rspeer/.virtualenvs/lum/bin/morfessor", line 22, in <module>
    main(sys.argv[1:])
  File "/home/rspeer/.virtualenvs/lum/bin/morfessor", line 13, in main
    morfessor.main(args)
  File "/home/rspeer/.virtualenvs/lum/lib/python3.5/site-packages/morfessor/cmd.py", line 466, in main
    analysis = csep.join(constructions)
TypeError: sequence item 0: expected str instance, tuple found
svirpioj commented 7 years ago

I tried replicating the problem, and seems that this it exists in version 2.0.1 but is already fixed in 2.0.2alpha4. Could you confirm this?

If you are using pip, include --pre to install a pre-release version.

svirpioj commented 7 years ago

Closing the issue, should work in release 2.0.3.