Open pauloneves opened 1 week ago
In general datastore_create and datastore_upsert are very slow ways of getting data into the datastore. Consider using a postgres COPY command like xloader and datapusher+ do for efficiently loading large datasets.
Or if you're interested in making it easier to connect pandas with ckanapi and the datastore API for loading data efficiently I would definitely entertain a pull request to make a fast path for loading datastore records.
is "datapusher+" different from "datapusher"
I had a lot of problems with datapusher trying to guess my datatypes and I'm trying to do the work myself creating the datastore via api and uploading data to it so I can control how it is stored.
yes, https://github.com/dathere/datapusher-plus analyzes all the data before setting types so that there's no errors on import
datastore_create() method accepts a
records
parameter with a list of dictionaries. It will be converted to json by the api..All types in the dictionary must be jsonable. A datatime value in the dict will issue a validation error.
To fix it I must convert my pandas dataframe to json string, have it loaded back to a python dictionary and then pass it as a parameter to the method that will convert it again to json.
It is very inefficient, specially for large datasets.
I'd like to be able to directly pass a json string to the datastore_create() or datastore_upsert() to have it sent to CKAN