TALP-UPC / FreeLing

FreeLing project source code
Other
251 stars 96 forks source link

Tag not found for contraction component #67

Closed ya5an closed 6 years ago

ya5an commented 6 years ago

Hello, thank you for your work! I have some trouble:

$ analyze -f en.cfg < test.en 
DICTIONARY: Tag not found for contraction component. Check dictionary entries for 'landain't' and 'land_ai'
$ cat test.en 
landain't everything to me

Latest version from master.

lluisp commented 6 years ago

Uhm, that is a good one.

First, "landain't" is not an actual word, and even FreeLing has limitations when dealing with weird stuff. FreeLing has several strategies to deal with words that are not in the dictionary: One of them is checking for contractions (ain't, won't, you'll, etc). Another is checking for compound words (afterthought, sleepwalk, outlive, green-eyed, etc).

"ain't" alone is properly handled by the first strategy. "landlord", "landslide", or even "landis" would be properly handle by the second (although the answer in the case of "landis" would be "land_be, VBZ" which makes not much sense, since it is not an actual compound but just a typo creating a nonsense word)

But the combination of both fenomena creates some bad interactions among these strategies, that end up in trying to solve "landain't" as the compound "land_ai" contracted with "not", when it should be parsed as a compound of "land" with the contraction "ain't"

I'll try to revise the behaviour of these modules, so at least they do not crash in cases like this

Thanks!