fechan / lestrade-tei-tagger

Mathematica-based TEI Tagger spun off from Audrey Holmes' Historical Markup Tool
GNU General Public License v3.0
0 stars 0 forks source link

Wolfram Entities sometimes lack Wikidata IDs #16

Open fechan opened 3 years ago

fechan commented 3 years ago

For example, Van Gogh's The Starry Night fails to resolve to a Wikidata QID. I have it set to skip such entities when building TEI indexes, but there should be some more useful behavior. Maybe alerting the user somehow to manually add it.

fechan commented 3 years ago

Wolfram Entities sometimes lack an associated Wikibase entry. They might still exist, so it might be good to search for it via Wikibase API (assuming it exists).

fechan commented 3 years ago

bumping priority on this because some other issues will benefit from solving this issue

fechan commented 3 years ago

We'll have to use a SPARQL query I think, since the wikidata library doesn't support it. I wrote something based on the Wikidata examples and some trial and error (I'm new to SPARQL), and I think it should work. However, the most "relevant" (popular?) entry doesn't exactly show up first on the list. Need to sort it somehow.

You can test this query here: https://query.wikidata.org/

#Filter labels using EntitySearch from mwapi service to provide Full Text Search
#Combine the Wikidata Query Service and the Mediawiki API
#https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI
#(which is backed by Elasticsearch) to perform very fast searching of entities by their label.
#
#This query will first contact EntitySearch (an alias to wbsearchentities)
#which will pass the items with a label of "soriano" it found to the triple store
#which in turn can now query the graph in a timely manner and filter those entities that are not humans.
#This solution only works if the number of items returned by wbsearchentities remains reasonable.

SELECT ?item ?itemLabel WHERE {
  SERVICE wikibase:mwapi {
      bd:serviceParam wikibase:endpoint "www.wikidata.org";
        wikibase:api "EntitySearch";
        mwapi:search "starry night"; # Search for things named "starry night"
        mwapi:language "en".
      ?item wikibase:apiOutputItem mwapi:item.
  }
  ?item wdt:P31/wdt:P279* wd:Q4502142 . # only include items that are instances or sub-classes of "visual artwork"
  SERVICE wikibase:label {bd:serviceParam wikibase:language "en".}
}
LIMIT 100