Closed MichaelRoeder closed 6 years ago
A user uploaded a dataset two times and it got stuck in an endless hashing loop inside the jena library.
eTConfig("XXX","SWC 2018 - Task 1 Evaluation","SWC2018T1","WEAK_ANNOTATION_MATCH") state=RUNNABLE progress=100.0% of dataset org.apache.jena.mem.NodeToTriplesMapMem.add(NodeToTriplesMapMem.java:52) org.apache.jena.mem.GraphTripleStoreBase.add(GraphTripleStoreBase.java:63) org.apache.jena.mem.GraphMem.performAdd(GraphMem.java:37) org.apache.jena.graph.GraphUtil.add(GraphUtil.java:138) org.apache.jena.rdf.model.impl.ModelCom.add(ModelCom.java:194) org.aksw.gerbil.evaluate.impl.ModelComparator.reduceModel(ModelComparator.java:107) org.aksw.gerbil.evaluate.impl.ModelComparator.reduceModel(ModelComparator.java:100) org.aksw.gerbil.evaluate.impl.ModelComparator.compareModel(ModelComparator.java:64) org.aksw.gerbil.evaluate.impl.ModelComparator.evaluate(ModelComparator.java:58) org.aksw.gerbil.evaluate.impl.ConfidenceScoreEvaluatorDecorator.evaluate(ConfidenceScoreEvaluatorDecorator.java:135) org.aksw.gerbil.evaluate.impl.ConfidenceScoreEvaluatorDecorator.evaluate(ConfidenceScoreEvaluatorDecorator.java:74) org.aksw.gerbil.execute.ExperimentTask.evaluate(ExperimentTask.java:323) org.aksw.gerbil.execute.ExperimentTask.runExperiment(ExperimentTask.java:293) org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:140) org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748) eTConfig("XXX","SWC 2018 - Task 1 Evaluation","SWC2018T1","WEAK_ANNOTATION_MATCH") state=RUNNABLE progress=100.0% of dataset org.apache.jena.mem.HashCommon.findSlot(HashCommon.java:168) org.apache.jena.mem.HashedTripleBunch.contains(HashedTripleBunch.java:40) org.apache.jena.mem.NodeToTriplesMapMem.add(NodeToTriplesMapMem.java:52) org.apache.jena.mem.GraphTripleStoreBase.add(GraphTripleStoreBase.java:63) org.apache.jena.mem.GraphMem.performAdd(GraphMem.java:37) org.apache.jena.graph.GraphUtil.add(GraphUtil.java:138) org.apache.jena.rdf.model.impl.ModelCom.add(ModelCom.java:194) org.aksw.gerbil.evaluate.impl.ModelComparator.reduceModel(ModelComparator.java:107) org.aksw.gerbil.evaluate.impl.ModelComparator.reduceModel(ModelComparator.java:100) org.aksw.gerbil.evaluate.impl.ModelComparator.compareModel(ModelComparator.java:63) org.aksw.gerbil.evaluate.impl.ModelComparator.evaluate(ModelComparator.java:58) org.aksw.gerbil.evaluate.impl.ConfidenceScoreEvaluatorDecorator.evaluate(ConfidenceScoreEvaluatorDecorator.java:135) org.aksw.gerbil.evaluate.impl.ConfidenceScoreEvaluatorDecorator.evaluate(ConfidenceScoreEvaluatorDecorator.java:74) org.aksw.gerbil.execute.ExperimentTask.evaluate(ExperimentTask.java:323) org.aksw.gerbil.execute.ExperimentTask.runExperiment(ExperimentTask.java:293) org.aksw.gerbil.execute.ExperimentTask.run(ExperimentTask.java:140) org.aksw.simba.topicmodeling.concurrent.workers.WorkerImpl.run(WorkerImpl.java:44) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748)
It was not a problem of the hashing itself but of the massive amount of comparisons caused by our simple repetition-based handling of confidence values.
Fixed by 83cac39227c784ac6d9a5fe19abea485756cedfc
Description
A user uploaded a dataset two times and it got stuck in an endless hashing loop inside the jena library.