ckan / datapusher

A standalone web service that pushes data files from a CKAN site resources into its DataStore
GNU Affero General Public License v3.0
77 stars 155 forks source link

datapusher appending rows to existing datastore #98

Open Mbrownshoes opened 8 years ago

Mbrownshoes commented 8 years ago

I'm updating a few resources nightly using the python ckanapi resource_update action. The resource is updating in the filestore but it we're seeing duplicate rows in the datastore, so if the resource has 1000 rows we often see 2000 in the data explorer and when using the datastore api. When we manually click the 'Upload to Datastore' button, we do not encounter this error.

In the datapusher jobs.py file there is reference to this error: ''' Delete existing datstore resource before proceeding. Otherwise 'datastore_create' will append to the existing datastore. And if the fields have significantly changed, it may also fail. '''

What is the difference in terms of the datastore usage between manually and automatically updating a resource? Is there a workaround for automatically updating the datastore?