ckan / ckanext-datastorer

Get files from ckan into the webstore.
21 stars 18 forks source link

The same resource is ingested twice? #68

Open drmalex07 opened 9 years ago

drmalex07 commented 9 years ago

This behaviour is observed in an instance running

Consider the case when a new resource is uploaded. I think that when archiver's download is trying to update metadata for the given resource (https://github.com/ckan/ckanext-archiver/blob/master/ckanext/archiver/tasks.py#L451) is causing a new IDomainObjectModification event to be fired. Thus, datastorer is notified again (because of this else clause: https://github.com/ckan/ckanext-datastorer/blob/master/ckanext/datastorer/plugin.py#L34) and a new task is sent to the queue.

I suppose that since the time of arrival of the second event is random (and of course the queue can run many parallel workers), this can lead to undesirable races if 2 parallel tasks are sending groups of records to the datastore table (?).