andreasvc / disco-dop

Discontinuous Data-Oriented Parsing
http://discodop.readthedocs.io
GNU General Public License v2.0
46 stars 16 forks source link

Update ftbtree in treebank #52

Closed TaniaBladier closed 6 years ago

TaniaBladier commented 6 years ago

Slightly updating function ftbtree in treebank.py in order to handle a few cases in French Treebank (FTB) where we have a compound consisting of just one word, e.g. like in sentence number 1497 in file flmf3_01000_01499ep.aa.xml: "w> cat="CL" compound="yes" ee="CL-suj-3ms" ei="CL3ms" lemma="il" mph="3ms" subcat="suj">-t-il <w". Otherwise we get an empty node like (MWCL ) instead of (MWCL -t-il).

TaniaBladier commented 6 years ago

I beg your pardon - now it is not clear at all what exactly I have changed. I just added 9 lines to the function ftbtree (4 lines of which are my comment). I'll do better next time!

andreasvc commented 6 years ago

Thanks, I will merge these changes in a new commit.

The issue is probably that the file was saved with Windows line endings, the Unix line endings of the original should be preserved. See https://www.jetbrains.com/help/pycharm/configuring-line-separators.html