I'm using CKAN to create a human-readable version of several databases' system catalogs.
This entailed creating a crawler script that uses ckanapi to populate CKAN with hundreds of datasets, with corresponding CSVs.
However, Datapusher quickly gets stuck when the script processes these CSVs in a large batch, though it will be able to handle small batches without problems for the very same files.
At first, the problem was the use of sqlite for the job store, as sqlite was never meant for concurrent access, with intermittent database lock operational errors showing up in the datapusher.ERR file as datapusher updates the job store. (#198).
This was fixed by #199 .
Still, uwsgi was still running as a single process. Even though the operational database lock errors were gone, datapusher was quickly overrun after processing a handful of CSVs.
CKAN version: 2.9
Datapusher version: 0.17
I'm using CKAN to create a human-readable version of several databases' system catalogs.
This entailed creating a crawler script that uses
ckanapi
to populate CKAN with hundreds of datasets, with corresponding CSVs.However, Datapusher quickly gets stuck when the script processes these CSVs in a large batch, though it will be able to handle small batches without problems for the very same files.
At first, the problem was the use of sqlite for the job store, as sqlite was never meant for concurrent access, with intermittent database lock operational errors showing up in the datapusher.ERR file as datapusher updates the job store. (#198).
This was fixed by #199 .
Still, uwsgi was still running as a single process. Even though the operational database lock errors were gone, datapusher was quickly overrun after processing a handful of CSVs.