ckan / ckanapi

A command line interface and Python module for accessing the CKAN Action API
Other
176 stars 74 forks source link

datastore_create can't upload a json records file #217

Open pauloneves opened 1 week ago

pauloneves commented 1 week ago

datastore_create() method accepts a records parameter with a list of dictionaries. It will be converted to json by the api..

All types in the dictionary must be jsonable. A datatime value in the dict will issue a validation error.

To fix it I must convert my pandas dataframe to json string, have it loaded back to a python dictionary and then pass it as a parameter to the method that will convert it again to json.

It is very inefficient, specially for large datasets.

I'd like to be able to directly pass a json string to the datastore_create() or datastore_upsert() to have it sent to CKAN

wardi commented 1 week ago

In general datastore_create and datastore_upsert are very slow ways of getting data into the datastore. Consider using a postgres COPY command like xloader and datapusher+ do for efficiently loading large datasets.

Or if you're interested in making it easier to connect pandas with ckanapi and the datastore API for loading data efficiently I would definitely entertain a pull request to make a fast path for loading datastore records.

pauloneves commented 6 days ago

is "datapusher+" different from "datapusher"

I had a lot of problems with datapusher trying to guess my datatypes and I'm trying to do the work myself creating the datastore via api and uploading data to it so I can control how it is stored.

wardi commented 6 days ago

yes, https://github.com/dathere/datapusher-plus analyzes all the data before setting types so that there's no errors on import