TheScienceMuseum / heritage-connector

Heritage Connector: Transforming text into data to extract meaning and make connections
https://www.sciencemuseumgroup.org.uk/projects/heritage-connector
MIT License
21 stars 3 forks source link

tiny comments to your paper at doi.org/10.1002/ail2.23 #283

Closed shigapov closed 3 years ago

shigapov commented 3 years ago

Thank you for your codes and your paper! I have tiny comments to (not-the-main-part-of) your paper.

I see that the manuscript was received on 18 December 2020. It was after SemTab2020, where the target knowledge graph was Wikidata. But in your paper the references are given only for the papers from SemTab2019, where the target knowledge graph was DBpedia, which is less relevant for your paper. Of course, I would not notice it, if I have not participated in SemTab2020 myself. ;-) Our open-source semantic annotator bbw won third place and was the only winning solution which has not used the Wikidata dump files. It is based on contextual matching and meta-lookup (via a local SearX instance). We also used the Blazegraph's RDF GAS API for finding distances between types. All papers from SemTab2020 can be found at https://www.cs.ox.ac.uk/isg/challenges/sem-tab/2020. Just a few solutions are also open source, please take a look at MantisTable 4 and JenTab. MTab provides an API.

Regarding NEL with Wikidata. It would be great to mention OpenTapioca, GUI for OpenTapioca and Antonin's paper. His paper discusses many relevant for you issues.

All other things in your paper are great! I am working on similar things and very thankful that you open sourced your work!

kdutia commented 3 years ago

Thanks for your feedback and for taking the time to read our work Renat!

The reason I referenced SemTab2019 is because when I was developing our method, SemTab2020 hadn't taken place yet. Very useful to know that the 2020 edition is for Wikidata though, I'll check it out.

I saw the OpenTapioca research a while back. While it's interesting, we found that its results weren't strong enough for our use case and that Facebook's BLINK performs much better.

Hope this clears those points up!

shigapov commented 3 years ago

Isn't Facebook's BLINK linking to Wikipedia? OpenTapioca links to Wikidata directly, right? But I don't know how many Wikipedia pages have now redirections to Wikidata entities. Depending on that, OpenTapioca might outperform BLINK on NEL for Wikidata. May be @wetneb could comment that...

kdutia commented 3 years ago

We use the Wikipedia API to get the QID for each Wikipedia page, and in practice have found that this provides sufficient coverage for us with very good accuracy - even picking up edge cases that you wouldn't expect.

This Wikipedia page states that every Wikipedia page should have a Wikidata ID.

Thanks again for the feedback, but I'll close this issue for now as we've found the correct solution for our use case. Happy to continue the conversation over email or in a new issue if you have any more queries.