UB-Mannheim / RaiseWikibase

Knowledge graph construction: Fast inserts into a Wikibase instance
https://ub-mannheim.github.io/RaiseWikibase/
MIT License
45 stars 7 forks source link

fill the secondary storage for items and properties #7

Closed shigapov closed 2 years ago

shigapov commented 3 years ago

Running the maintenance scripts for filling the secondary storage is a just workaround. We need to fill the secondary storage for items and properties properly. See https://doc.wikimedia.org/Wikibase/master/php/md_docs_storage_terms.html.

shigapov commented 3 years ago

The commit https://github.com/UB-Mannheim/RaiseWikibase/commit/428c28b9334e9bdb613b3a49d42384a88c7398a2 allows to insert the fingerprint data (labels, aliases and descriptions) into the secondary tables on the fly as well. The results of the first tests using https://github.com/UB-Mannheim/RaiseWikibase/blob/main/megaWikibase.py are here:

  1. 8965 properties with monolingual labels, descriptions and aliases are uploaded in 99 seconds (previously in 42 seconds). So roughly 90 properties per second.
  2. 20000 items with one label and without aliases/descriptions (but with 2-3 claims with a qualifier & reference) are uploaded in 112 seconds (previously in 84 seconds). Roughly 178 items per second.

Performance has dropped. Can it be improved?