dkpro / dkpro-uby

Framework for creating and accessing UBY resources – sense-linked lexical resources in standard UBY-LMF format
https://dkpro.github.io/dkpro-uby
Other
22 stars 3 forks source link

import of sense alignmnents using temporary tables sometimes fails #129

Open judithek opened 9 years ago

judithek commented 9 years ago

for example when using GermaNetWiktionaryDeAlignment

and adapting it to querying a database containing OntoWiktionary instead of Wiktionary (different Uby sense IDs, but same original sense IDs).

Problems: 1) all pairs in SenseAxis are wrong 2) for some strange reason, the import script also pairs senses which are both from OntoWiktionary

example:

ad 1) <SenseAxis id="GN9_OntoWktDE_14" senseOne="GN_Sense_22345" senseTwo="OntoWktDE_sense_5671" senseAxisType="monolingualSenseAlignment"/> -> Misskredit, Blockflöte

<SenseAxis id="GN9_OntoWktDE_16" 
senseOne="GN_Sense_18545" senseTwo="OntoWktDE_sense_53381" senseAxisType="monolingualSenseAlignment"/>

-> Therapieform, Blockhaus

ad 2) <SenseAxis id="GN9_OntoWktDE_15" senseOne="OntoWktDE_sense_12115" senseTwo="OntoWktDE_sense_5671" senseAxisType="monolingualSenseAlignment"/>

ChM: One crucial problem is that OntoWiktionary != Wiktionary. So far, we keep the old 2011 dump version of Wiktionary around, mainly because we haven't replaced the original word sense alignment I've created in 2011 with a newer one based on DWSA. OntoWiktionary, however, makes use of a 2013 dump and uses a different JWKTL version. Thus, the original sense IDs are NOT compatible. This should explain why all SenseAxis pairs are wrong (Sorry, I could have raised this earlier, but I thought that using the new alignment framework we had newly created, OntoWiktionary-specific alignments).

This of course does not explain why in some cases two OntoWiktionary senses are aligned. I cannot say much about that, but probably there is a lexicon check missing? It is possible that an original sense ID of OntoWiktionary and matches an original ID from a different resource. It is therefore crucial to check the lexicon (respetively, the external system identifer). If that's not the issue, than there's of course the chance for a major bug in the software - I did not check any source code before filling up this textarea...

JEK:

OntoWiktionary, however, makes use of a 2013 dump and uses a different JWKTL version. Thus, the original sense IDs are NOT compatible. I am aware of that. Yet, the alignment of the original sense IDs appears to be still (mostly I guess) valid (I did not find a wrong alignment yet when hand picking arbitrary pairs and looking them up via their MonolingualExternalRefs) iff it is imported via the Uby API (which checks for original sense ID AND external system).

However the import via temporary tables fails as described. This might indeed be caused by not checking for expernalSystem - but the database that I used for looking up original sense IDs contained only GermaNet, WordNet and OntoWiktionary - no other lexicon.