ckan / datapusher

A standalone web service that pushes data files from a CKAN site resources into its DataStore
GNU Affero General Public License v3.0
77 stars 155 forks source link

DataPusher fails when getting private CKAN resources #17

Closed amercader closed 10 years ago

amercader commented 10 years ago

For uploaded files in CKAN that belong to private datasets, we must send the API key, otherwise we won't be able to get them.

Also the code that handles messytables parsing of the file could be improved, otherwise you get a nasty exception:

Fetching from: http://localhost:5000/dataset/097d4f94-3642-4f7a-a7ac-992bec6d5627/resource/9d00a104-1773-4422-81d8-f67a74b70f74/download/districtcenterpoints.csv
Job "push_to_datastore (trigger: RunTriggerNow, run = True, next run at: None)" raised an exception
Traceback (most recent call last):
  File "/home/adria/dev/pyenvs/ckan_datastore/lib/python2.7/site-packages/APScheduler-2.1.1-py2.7.egg/apscheduler/scheduler.py", line 512, in _run_job
    retval = job.func(*job.args, **job.kwargs)
  File "/home/adria/dev/pyenvs/ckan_datastore/src/datapusher/datapusher/jobs.py", line 278, in push_to_datastore
    row_set = table_set.tables.pop()
IndexError: pop from empty list
amercader commented 10 years ago

Fixed on 2394e81

baskinomics commented 10 years ago

I am still experiencing this bug in CKAN 2.2:

Error: [u' File "/usr/lib/ckan/datapusher/lib/python2.7/site-packages/apscheduler/scheduler.py", line 512, in _run_job\n retval = job.func(*job.args, **job.kwargs)\n', u' File "/usr/lib/ckan/datapusher/src/datapusher/datapusher/jobs.py", line 278, in push_to_datastore\n row_set = table_set.tables.pop()\n', u"IndexError('pop from empty list',)"]
amercader commented 10 years ago

We need to have a look again

ntoll commented 10 years ago

Yes please do. ;-)

nigelbabu commented 10 years ago

Hey @ntoll, @baskinomics. I can't reproduce this error. Can either of you help out with some steps to reproduce?

teodorescuserban commented 10 years ago

Hi guys,

Any luck on this bug? I'm getting it as well on ckan 2.2. Steps to reproduce - following the ckan docs about datapusher from http://docs.ckan.org/projects/datapusher/en/latest/ Thank you.

teodorescuserban commented 10 years ago

I hope it will help someone else as well. On my case, this error appeared only on resources uploaded in the localstore and not on url/api resources. The reason of this was that my web server incorrectly sent the mime type of csv files as text/html. After teaching it to send as text/csv, datapusher worked without error.

joetsoi commented 10 years ago

@teodorescuserban thanks for that, this occurs as the mimetype causes messytables to return an HTMLTableSet with an emtpy list as it's rowset.

We should probably make a better error message, whenever this occurs.

I'm closing this issue as it's unrelated to whether the dataset is private or not, unless @ntoll or @baskinomics can provide more logs/details