ckan / datapusher

A standalone web service that pushes data files from a CKAN site resources into its DataStore
GNU Affero General Public License v3.0
77 stars 153 forks source link

Datapusher gets intermittently stuck when processing a large number of resources #200

Closed jqnatividad closed 4 years ago

jqnatividad commented 4 years ago

CKAN version: 2.9

Datapusher version: 0.17

I'm using CKAN to create a human-readable version of several databases' system catalogs.

This entailed creating a crawler script that uses ckanapi to populate CKAN with hundreds of datasets, with corresponding CSVs.

However, Datapusher quickly gets stuck when the script processes these CSVs in a large batch, though it will be able to handle small batches without problems for the very same files.

At first, the problem was the use of sqlite for the job store, as sqlite was never meant for concurrent access, with intermittent database lock operational errors showing up in the datapusher.ERR file as datapusher updates the job store. (#198).

This was fixed by #199 .

Still, uwsgi was still running as a single process. Even though the operational database lock errors were gone, datapusher was quickly overrun after processing a handful of CSVs.