Open FelixFrizzy opened 1 week ago
Hi,
thanks for raising this issue. Can you maybe check if it also happens within one test case and the code provided below:
ExecutionResultSet ers = Executor.run(TrackRepository.DigitalHumanities.All2024.getTestCase("..."),new ForwardAlwaysMatcher("./system.xml"));
EvaluatorCSV evaluatorCSV = new EvaluatorCSV(ers);
evaluatorCSV.writeToDirectory();
Then we have a small reproducible setup, and I can check where the error actually appears. Thanks
If I understand the source code correctly, the system.xml
file should be the reference.rdf
of the track?
I then get when using
ExecutionResultSet ers = Executor.run(TrackRepository.DigitalHumanities.Dhcs2024.getTestCase(1),new ForwardAlwaysMatcher("./system.xml"));
for the trackPerformanceCube.csv
:
Track,Track Version,Test Case,Matcher,Type,Precision (P),Recall (R),Residual Recall (R+),F1,"# of TP","# of FP","# of FN","# of Correspondences",Time,Time (HH:MM:SS)
dh,2024dhcs,tadirah-unesco,ForwardAlwaysMatcher,ALL,1.0,1.0,1.0,1.0,15,0,0,15,98571192,00:00:00
dh,2024dhcs,tadirah-unesco,ForwardAlwaysMatcher,CLASSES,0.0,0.0,0.0,0.0,0,0,0,0,-,-
dh,2024dhcs,tadirah-unesco,ForwardAlwaysMatcher,PROPERTIES,0.0,0.0,0.0,0.0,0,0,0,0,-,-
dh,2024dhcs,tadirah-unesco,ForwardAlwaysMatcher,INSTANCES,1.0,1.0,1.0,1.0,15,0,0,15,-,-
Some more details:
I used the systemAlignment.rdf as system.xml (which makes more sense to test everything).
When doing so, i get an empty trackPerformanceCube.csv
with headers only.
I tracked the problem down, it seems that it is a problem on logmap-bio's side.
The systemAlignment contains lines like this:
<map>
<Cell>
<entity1 rdf:resource="http://tadirah.dariah.eu/vocab/index.php?tema=24&/editing"/>
<entity2 rdf:resource="http://vocabularies.unesco.org/thesaurus/concept3810"/>
<measure rdf:datatype="xsd:float">1.0</measure>
<relation>=</relation>
</Cell>
</map>
The resource of entity1 should be https://vocabs.dariah.eu/tadirah/editing
, logmap-bio got this somehow wrong and used the skos:closeMatch
URI for some reason. But only some of the entities have the wrong URI, most of them are correct.
It would be nice if the correct mappings were also reflected in the output CSV, but I'm not sure if that is possible.
Describe the bug
When using the evaluation-client to run logmap-bio with the DH tracks, I get a wrong performance.csv for the "tadirah-unseco" test case. It shows me 0 TP, but the "true" number is higher when looking into the systemAlignment.rdf and compare it to the reference.rdf of the test case. I think it would be important to find out if this is an issue on the matcher's side (unproblematic) or an issue in MELT. The latter would be more problematic since the evaluation is highly based on the numbers of the performane.csv files.
To Reproduce
Steps to reproduce the behavior:
Version of MELT: evaluation client from documentation
Java version:
openjdk version "1.8.0_422"
Python version:
3.8.18
Operating system:
macOS 14.4
Run
java -jar matching-eval-client-latest.jar --systems ../Matcher/DockerMatcher/logmap-bio-melt-oaei-2021-web-latest.tar.gz --track http://oaei.webdatacommons.org/tdrs/ dh 2024all --results oaei2024_logmapbio_oaeidh_(date +"%Y-%m-%d_%H-%M-%S")
performance.csv
:systemAlignment.rdf: (removed unneeded alignments for readability)
```[reference.rdf](https://github.com/FelixFrizzy/DH-benchmark/blob/main/dhcs2_tadirah-unesco/reference.rdf) of tadriah-unesco test case (removed unneeded alignments for readability)
```The correctly identified alignment is not reflected in the perfomance.csv (along with all the other TP's)
Full log output
issue.log
Expected behavior
The
performance.csv
should list 10 TP and 5 FP instead of 0TP and 15FP.