[QA] Debug why macro f-measure does not work with Hawk - Githubissues

dice-group / gerbil

GERBIL - General Entity annotatoR Benchmark

GNU Affero General Public License v3.0

224 stars 58 forks source link

[QA] Debug why macro f-measure does not work with Hawk #155

Closed RicardoUsbeck closed 8 years ago

RicardoUsbeck commented 8 years ago

Makro F-measure seems to be broken, see experiments

gerbil-qa.aksw.org/gerbil/experiment?id=201610230000 gerbil-qa.aksw.org/gerbil/experiment?id=201610230001

Please investigate

TortugaAttack commented 8 years ago

Results of yoda (gerbil-qa.aksw.org/gerbil/experiment?id=201610230000) are actually "correct" :D The Problem is that the Questions with OUT OF SCOPE (Should they be tested?) will have 0 results. As Yoda was down it has 0 results. makes tp:0 fp:0 fn:0 and therefore (by default of gerbil) is precision 1.0, recall 1.0 and f measure 1.0

Thus micro is always 0, but macro can be greater than 0

RicardoUsbeck commented 8 years ago

Actually we need to filter OUT OF SCOPE questions, could you do that and not use them?

TortugaAttack commented 8 years ago

will fix it

TortugaAttack commented 8 years ago

OOS will not be tested anymore.

RicardoUsbeck commented 8 years ago

Cool. So can this issue be closed?

TortugaAttack commented 8 years ago

Not yet, i tried it without the OOS questions (which worked), but still the (C | P | RE )2KB Experiments have results higher than 0, because the queries of some questions could not be parsed correctly (f.e. a GROUP is missing etc.) thus they have no resource markings etc. and will therefore end into an empty set in these experiments. Same as the OOS it will have tp=0 , fp=0 , fn=0 and therefore 1. (They have answers, but no markings, so the QA test works, only the (C | P | RE )2KB does have the problem)

Should they be taken out too? Further more the ErrorCounter does not work with the NLIWOD Based Systems. Will fix that today.

MichaelRoeder commented 8 years ago

Ok, please add a short summary what the problem of the error count is to make sure whether we have a similar problem in the NER/EL GERBIL

TortugaAttack commented 8 years ago

the QASystem never throws an Exception, even if the Annotator does, the ASystem of NLIWOD will not. It simply returns an IQuestion object with nulls and empty sets. So it should not be a Problem with NER/EL GERBIL ;)

So its more an Error in the QASystem then the ErrorCounter itself.

Fix will be: if answertype, SparqlQuery and PseudoSparqlQuery will be null and the answer is an empty set, QASystem will throw a GerbilException.

RicardoUsbeck commented 8 years ago

Ok, so we need to change that in NLIWOD. If the SPARQL query cannot be parsed the sub-experiments should not be executed, I think.

TortugaAttack commented 8 years ago

Ok, will do that. ;)

TortugaAttack commented 8 years ago

Should all work now with ba28621 and c24736b in NLIWOD (dev). Will make a final test and if everything works, i will close the issue

RicardoUsbeck commented 8 years ago

Please deploy a new higher snapshot version of nliwod to GERBIL right out of the dev branch

TortugaAttack commented 8 years ago

Deploying it right now ;)

AS NLIWOD is based upon jena 3.1.0 and gerbil-qa uses 2.13.0 i would carefully update the jena dependencies to 3.1.0. So NLIWOD and gerbil-qa will better work together

MichaelRoeder commented 8 years ago

I have updated GERBIL to jena3.1 in another branch. You might want to reuse it ;)

MichaelRoeder commented 8 years ago

The name of the other branch is jena3.1. https://github.com/AKSW/gerbil/tree/jena3.1