KRR-Oxford / DeepOnto

A package for ontology engineering with deep learning and language models.
https://krr-oxford.github.io/DeepOnto/
Apache License 2.0
193 stars 12 forks source link

Generating results for EditSim #2

Closed Murchana closed 2 years ago

Murchana commented 2 years ago

Hi, I am not able to generate the exact H@1 and MRR for EditSim for the FMA SNOMED task as reported in Table 4 in https://arxiv.org/pdf/2205.03447.pdf.

This is the command used:

python om_eval.py --saved_path './om_results' --pred_path './onto_match_experiment2/edit_sim/global_match/src2tgt' --ref_anchor_path 'data/equiv_match/refs/snomed2fma.body/unsupervised/src2tgt.rank/for_eval' --hits_at 1

These are the generated numbers: H@1: .841 and MRR: .89 Reported nos. in the paper: H@1: 869 and MRR: .895

I am not sure why the numbers are not consistent. Is there anything that needs to be modified in the code to get the reported numbers?

Lawhy commented 2 years ago

Hey @Murchana, sorry for the late reply (I didn't receive any email notification). The code was under major improvement during the time you reported this issue; I will have a look and get back to you later.

Lawhy commented 2 years ago

Hey @Murchana, I have re-run the evaluation code, and the results are still the same as before.

python om_eval.py -p experiments/umls/snomed2fma.body.us/edit_sim/pair_score/src2tgt -a data/UMLS/equiv_match/refs/snomed2fma.body/unsupervised/test.cands.tsv --hits_at 1

which gave me:

######################################################################
###                     Eval using Hits@K, MRR                     ###
######################################################################

635619/635619 of scored mappings are filled to corresponding anchors.
{
    "MRR": 0.895,
    "Hits@1": 0.869
}

Could you please check the most recent usage of the evaluation script and download the most recent data resources where the anchored mappings are simplified to a .tsv format?

Lawhy commented 2 years ago

I have re-run it again and the MRR and Hits@K values have changed a bit; the reason for that is the sorting of entities of the same mapping value is somewhat random, for example:

# EditSim Output 1 Snapshot 
http://snomed.info/id/82095009  http://purl.org/sig/ont/fma/fma58709    1.0
http://snomed.info/id/82095009  http://purl.org/sig/ont/fma/fma58708    0.8055555555555556
http://snomed.info/id/82095009  http://purl.org/sig/ont/fma/fma58714    0.8055555555555556

# EditSim Output 2 Snapshot 
http://snomed.info/id/82095009  http://purl.org/sig/ont/fma/fma58709    1.0
http://snomed.info/id/82095009  http://purl.org/sig/ont/fma/fma58714    0.8055555555555556
http://snomed.info/id/82095009  http://purl.org/sig/ont/fma/fma58708    0.8055555555555556

And the evaluation results for Output2 are shown below:

######################################################################
###                     Eval using Hits@K, MRR                     ###
######################################################################

635619/635619 of *unique* scored mappings are valid and filled to corresponding anchors.
659530/659530 of anchored mappings are scored; for local ranking evaluation, all anchored mappings should be scored.
{
    "MRR": 0.892,
    "Hits@1": 0.865,
    "Hits@5": 0.926,
    "Hits@10": 0.946,
    "Hits@30": 0.977,
    "Hits@100": 1.0
}
Murchana commented 2 years ago

Thanks!

Another question related to BertMap: How many candidates are used for calculating the MRR and Hit@1?

Lawhy commented 2 years ago

As stated in the resource paper, all the systems are evaluated (for local ranking) against 100 negative candidates, so overall 101 (including the reference mapping itself) candidates.