dahlia / wikidata

Wikidata client library for Python
https://pypi.org/project/Wikidata/
GNU General Public License v3.0
337 stars 31 forks source link

Speed of looking up properties #7

Open az0 opened 6 years ago

az0 commented 6 years ago

I am looping through entities and looking up multiple properties for each (7 in my real project, 3 in the attached toy example). Each property slows it down, so it will take hours to go through all the entities. Is there a way to speed this up please?

from wikidata.client import Client

client = Client()  # doctest: +SKIP

p_givenname = client.get('P735')
p_surname = client.get('P734')
p_dob = client.get('P569')

def get_entity(wikidata_id):
    entity = client.get(wikidata_id, load=True)

    givenname = entity[p_givenname].label
    surname = entity[p_surname].label
    dob = entity[p_dob]
    print ('%s %s %s' % (givenname, surname, dob))

w_ids = ['Q498805',
         'Q482745',
         'Q186',
         'Q1363428',
         'Q299700',
         'Q196223',
         'Q488828',
         'Q490120']

import datetime as dt
n0 = dt.datetime.now()
for w_id in w_ids:
    get_entity(w_id)
n1 = dt.datetime.now()
print ('elapsed time: ', n1 - n0)
print ('record count: ', len(w_ids))
az0 commented 6 years ago

Am I incorrectly using the library, or is there an issue in the library? I left my program running for days, and it did not finish.

k----n commented 6 years ago

I would probably use a SPARQL query instead (https://query.wikidata.org/).

az0 commented 6 years ago

I used SPARQL to get the IDs such as Q498805 to feed into Wikidata, but I could not figure out how to get all the metadata out of SPARQL.

k----n commented 6 years ago

Use the "wdt" prefix in the predicate.

SELECT * WHERE
{
     wd:Q498805 wdt:P569 ?o .
     SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
az0 commented 6 years ago

@k----n I expanded on that, and it helps. I will continue working with it to add the labels and filters I was trying. Thanks