aymara / lima

The Libre Multilingual Analyzer, a Natural Language Processing (NLP) C++ toolkit.
http://aymara.github.io/lima/
Other
108 stars 21 forks source link

Problem with token number in the CoNLL output #41

Open osf9018 opened 8 years ago

osf9018 commented 8 years ago

In the fourth sentence of the test-conll.txt file, token 26 is referenced in syntactic dependencies but don't exist in the list of tokens, which has a token 25 and a token 27 but none token 26. The text.txt file is the initial file.

test.txt test-conll.txt

kleag commented 8 years ago

In fact, tokens 22, 24 and 26 are missing.

kleag commented 7 years ago

This seems to be caused by the idiomatic expressions module giving a wrong part of speech tag to the reflexive verb.