SPARQL can only be used to search entities that haven been created > 10 min ago. For entities created < 10 min ago, it is not guaranteed that the synchronization with blazegraph has been completed.
To check duplicate entities (if the entity label is known) one can first perform an SQL query to obtain all the wikibase IDs with that label (there may be more than one). Once the IDs are known the wikibase API (action=wbgetentities) can be used to check particular statements for that entity (e.g. is the entity an 'instance of':'Publication'?).
Wikibase creates several SQL tables but mainly just to save the label information (see: https://doc.wikimedia.org/Wikibase/master/php/md_docs_storage_terms.html). Statements about the entities are not directly saved in the wikibase SQL tables. Instead they are directly saved as a page using the MediaWiki schema. Thus, it is not possible to perform a simple SQL query to check if an entity has a given statement.
It is not possible to use directly the wikibase API with action=wbsearchentities to check for duplicate entities. wbsearchentities also has an internal delay an only show results for entities that have been created ~> 2 min ago.
Functions implementing this approach to check duplicates during import (SQL query + wikibase API) are already available in docker-importer.
Issue description:
TODOS:
Acceptance-Criteria
Checklist for this issue: