dice-group / gerbil

GERBIL - General Entity annotatoR Benchmark
GNU Affero General Public License v3.0
224 stars 58 forks source link

[SWC] Endless hashing loop #279

Closed MichaelRoeder closed 6 years ago

MichaelRoeder commented 6 years ago

Description

A user uploaded a dataset two times and it got stuck in an endless hashing loop inside the jena library.

eTConfig("XXX","SWC 2018 - Task 1 Evaluation","SWC2018T1","WEAK_ANNOTATION_MATCH")
state=RUNNABLE
progress=100.0% of dataset
org.apache.jena.mem.NodeToTriplesMapMem.add(NodeToTriplesMapMem.java:52)
org.apache.jena.mem.GraphTripleStoreBase.add(GraphTripleStoreBase.java:63)
org.apache.jena.mem.GraphMem.performAdd(GraphMem.java:37)
org.apache.jena.graph.GraphUtil.add(GraphUtil.java:138)
org.apache.jena.rdf.model.impl.ModelCom.add(ModelCom.java:194)
org.aksw.gerbil.evaluate.impl.ModelComparator.reduceModel(ModelComparator.java:107)
org.aksw.gerbil.evaluate.impl.ModelComparator.reduceModel(ModelComparator.java:100)
org.aksw.gerbil.evaluate.impl.ModelComparator.compareModel(ModelComparator.java:64)
org.aksw.gerbil.evaluate.impl.ModelComparator.evaluate(ModelComparator.java:58)
org.aksw.gerbil.evaluate.impl.ConfidenceScoreEvaluatorDecorator.evaluate(ConfidenceScoreEvaluatorDecorator.java:135)
org.aksw.gerbil.evaluate.impl.ConfidenceScoreEvaluatorDecorator.evaluate(ConfidenceScoreEvaluatorDecorator.java:74)
org.aksw.gerbil.execute.ExperimentTask.evaluate(ExperimentTask.java:323)
org.aksw.gerbil.execute.ExperimentTask.runExperiment(ExperimentTask.java:293)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:140)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

eTConfig("XXX","SWC 2018 - Task 1 Evaluation","SWC2018T1","WEAK_ANNOTATION_MATCH")
state=RUNNABLE
progress=100.0% of dataset
org.apache.jena.mem.HashCommon.findSlot(HashCommon.java:168)
org.apache.jena.mem.HashedTripleBunch.contains(HashedTripleBunch.java:40)
org.apache.jena.mem.NodeToTriplesMapMem.add(NodeToTriplesMapMem.java:52)
org.apache.jena.mem.GraphTripleStoreBase.add(GraphTripleStoreBase.java:63)
org.apache.jena.mem.GraphMem.performAdd(GraphMem.java:37)
org.apache.jena.graph.GraphUtil.add(GraphUtil.java:138)
org.apache.jena.rdf.model.impl.ModelCom.add(ModelCom.java:194)
org.aksw.gerbil.evaluate.impl.ModelComparator.reduceModel(ModelComparator.java:107)
org.aksw.gerbil.evaluate.impl.ModelComparator.reduceModel(ModelComparator.java:100)
org.aksw.gerbil.evaluate.impl.ModelComparator.compareModel(ModelComparator.java:63)
org.aksw.gerbil.evaluate.impl.ModelComparator.evaluate(ModelComparator.java:58)
org.aksw.gerbil.evaluate.impl.ConfidenceScoreEvaluatorDecorator.evaluate(ConfidenceScoreEvaluatorDecorator.java:135)
org.aksw.gerbil.evaluate.impl.ConfidenceScoreEvaluatorDecorator.evaluate(ConfidenceScoreEvaluatorDecorator.java:74)
org.aksw.gerbil.execute.ExperimentTask.evaluate(ExperimentTask.java:323)
org.aksw.gerbil.execute.ExperimentTask.runExperiment(ExperimentTask.java:293)
org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:140)
org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
MichaelRoeder commented 6 years ago

It was not a problem of the hashing itself but of the massive amount of comparisons caused by our simple repetition-based handling of confidence values.

Fixed by 83cac39227c784ac6d9a5fe19abea485756cedfc