SuLab / WikidataIntegrator

A Wikidata Python module integrating the MediaWiki API and the Wikidata SPARQL endpoint
MIT License
244 stars 46 forks source link

Consider async requests #127

Closed physikerwelt closed 1 year ago

physikerwelt commented 4 years ago

I am not sure if this is the right tool for this job, but my use-case is importing about 10 Million statements to a private Wikibase instance. Here async parallel requests would be desirable. One way to go would be switching from the requests library to https://github.com/encode/httpx and have a parameter for async. However, next to the existing parameter that defines if the result is a data frame or a regular dictionary, the additional switch would make type hinting even more difficult and might cause confusion. See also #113

andrawaag commented 1 year ago

I am not sure if async requests will help here. Currently, the most efficient way to interact with the API is to built the WB Item internally before submitting the fully built JSON blob to the API. This is currently the MO of the wikidata integrator. Submitting statement by statement to the API, synchronously or a-synchronously is implemented in the wikibase API and is possible to do within WDI, but would create an indexing nightmare on the targeted wikibase