TALP-UPC / FreeLing

FreeLing project source code
Other
252 stars 96 forks source link

Inconsistency between sense dictionary and KB #109

Closed alexandretessarollo closed 3 years ago

alexandretessarollo commented 4 years ago

I've made some changes to senses30.src, locucions.dat, dicc.src in ../usr/local/share/freeling/en/ folder and to wn.dat in ../usr/local/share/freeling/common/ folder. I also adjusted ukb.dat to point to wn.dat instead of xwn.dat. Now I'm calling analyze -f /usr/local/share/freeling/config/en.cfg --sense ukb --loc LocutionsFile=../usr/local/share/freeling/en/locucions.dat < myfile.txt

I'm getting the following error message (for several different synsets): Unknown synset 02749904-v ignored. Please check consistency between sense dictionary and KB

I've double checked and all the files I changed have the 02749904-v synset references precisely as in the original versions: senses30: 02749904-v be dicc: be be VB

In WordNet 02749904-v has just one word ("be") and no relations to other synsets. Accordingly, wn.dat has no mention to 02749904-v.

What could be happening? And how to fix it?

arademaker commented 3 years ago

Hi @lluisp , the root of the error is a bug in the PWN 3.0 files. See https://github.com/globalwordnet/english-wordnet/issues/531. This synset ended up not having any relation.

Freeling is writing in the STDERR the warning message saying that it found a sense for a lemma but this sense has no connections with the rest of the network. So far, nobody saw it because the default relations is the xwn.dat, which contains the extra relations extracted from the tagged glosses. Unfortunately, the gloss of this synset has some words disambiguated and so this synset in the xwn.dat has relations.

I don't see anything that FL could do differently here.

lluisp commented 3 years ago

FreeLing performs minimum consitency checks, and if data is not consistent, it will most likely crash. In this case, the error is just a warning, so you can ignore it. Just know that this sense will never be selected, since the desambiguation algorithm relies on the graph structure. Maybe another WSD algorithm would not care about that and would not produce the warning. However, UKB is the only WSD algortihm implemented in FreeLing.

arademaker commented 3 years ago

Yep. For me, this issue can be closed.

arademaker commented 3 years ago

FL doesn't really ignore them, just list with 0% probability... So, in the end, FL is doing the best it can.