fberrizbeitia / wikidataEnricher

Enriches documents with wikidata entities
Creative Commons Zero v1.0 Universal
1 stars 1 forks source link

Missing "sameAs" dbpedia->wikidata links for some concepts #1

Open photomedia opened 3 years ago

photomedia commented 3 years ago

The dbpedia->wikidata link dereferencing doesn't work in some cases. So this code fails to find a corresponding wikidata link for some terms:

dbpediaPredicates = dbpediaObj.get(dbpediaURI).get('http://www.w3.org/2002/07/owl#sameAs')

In some cases, the wikidata link just isn't there in the "sameas" predicates, so for example: https://dbpedia.org/page/Information_visualization

Other examples:

Abstract texts you can test in DBPedia Spotlight to catch some of these:

photomedia commented 3 years ago

In the fork, I added a secondary lookup to address this issue, as many of these entities with missing "SameAs" links in the dbpedia.org/resource record do have a "SameAs" link to WikiData retrievable here: https://global.dbpedia.org/same-thing/lookup/

For example: https://global.dbpedia.org/same-thing/lookup/?uri=http://dbpedia.org/resource/Doppler_effect

See: https://github.com/photomedia/citationDataEnrichTransform/blob/d83331d0fdbc11d2abd40411591764aae87a9502/get-concepts.py#L122

Sometimes, these actually contain MULTIPLE wikidata links, which is another issue.