ckan / ideas

[DEPRECATED] Use the main CKAN repo Discussions instead:
https://github.com/ckan/ckan/discussions
40 stars 2 forks source link

extension: json datastore #143

Open joetsoi opened 9 years ago

joetsoi commented 9 years ago

https://github.com/joetsoi/ckanext-jsondatastore

This is an extension that I've wanted to work on for a long time. I haven't done more than test that it's possible. I'd like to replace the datastore, because I hate it (that's a lie, hate is a strong word), with a json based datastore using a postgres jsonb field for storage. This allows more flexibility and we don't need to care so much about data types when importing.

csv files are generally a bit rubbish when detecting field types, numeric, datetime and it often gets it wrong. Basically because csv files are 'weakly typed' as it were and a database is 'strongly type' (by whcih I mean you have to specify exact data types). With a json datastore, we don't need to care and we can leave the concerns of datatype to the user of the datastore because our datatypes will be a more flexible json format

This also would full fill the wishes of people who want a hipster nosql datastore, because jsonb is even more hipster.

(bonus wish, not to use raw sql, but to provide an equivalent api that allows the functionality of any sql query people wish. Mainly because raw sql keeps me awake at night.

jqnatividad commented 9 years ago

:+1:

luiscape commented 9 years ago

+1

Sounds like an interesting addition. Here are my two cents: http://luiscapelo.info/proposing-nosql-ckan-datastores/

wardi commented 9 years ago

I'd also like a pyspark datastore. :-)

edit: I guess I mean an HDFS store

maxious commented 9 years ago

+1

One of the (good to have) challenges we have now with the SQL datastore is for nested data (eg. this is an organization which has X addresses, Y websites, Z statistics which each have a date/type/value etc.). One solution is a relational database with foreign keys etc. Perhaps a better solution is a NoSQL datastore.

rufuspollock commented 9 years ago

@maxious btw you are aware you can do joins etc between DataStore tables so you can normalize (you can't really enforce FKs though). If you want an example see https://gist.github.com/rgrp/0b24589bd3293234917b

joetsoi commented 9 years ago

@luiscape , might be worth changing your blog post link in http://www.luiscapelo.info/proposing-nosql-ckan-datastores/ to this issue, as there's not much in the jsondatastore extension yet apart from an hour hack when i thought it would be a nice idea a year or so ago. (nice blogpost btw!)

luiscape commented 9 years ago

@joetsoi +1

just updated the post: http://luiscapelo.info/proposing-nosql-ckan-datastores/

jqnatividad commented 9 years ago

@luiscape, you may also want to look at #117 - leveraging the JSONB datatype in Postgres 9.4+ to gain NoSQL-like capabilities.

Perhaps, major orgs like UN, WB, and several national govts who are using CKAN to underpin their opendata initiatives can pool resources and fund this kind of shared infrastructure work #152

luiscape commented 9 years ago

@jqnatividad will take a look at it. Thank you for sharing!