Open rossjones opened 8 years ago
At Open Knowledge, we are working on lots of tooling around Data Package and JSON Table Schema as part of the Frictionless Data project.
We have libraries that do schema inference in Python and Javascript along with a bunch of other things related to schemas. We already have a few apps live that do analysing and guessing of types in both Python and Javascript, including DataPackagist and the new OpenSpending Packager.
We've started integrating this ecosystem of tools into CKAN with https://github.com/ckan/ckanext-datapackager as a first step.
We've also written some nice packages to leverage JSON Table Schema and make import and export flows with data storage backends and tabular data formats (CSV, Excel, JSON) seamless.
Interfaces for SQL and BigQuery are done, and more are planned (Mongo, etc.).
The next logical step in terms of CKAN integration, from my perspective, is to use all of the above to greatly improve both the import/validation pipelines of data into CKAN, and, crucially, to radically improve the datastore.
CKAN integration is definitely part of the Frictionless Data roadmap, and it would be great to work with the wider community on CKAN/Frictionless Data integration, to solve issues like this in datapusher/datastore in a robust way.
+1 on @pwalsh and also flag my longish comments in this earlier issue about data pusher where I suggested "Connect / Reuse Frictionless Data and Data Package" https://github.com/ckan/ideas-and-roadmap/issues/150#issuecomment-107009977
Using something like https://github.com/timwis/csv-schema which is an in-browser JS only tool for analysing and guessing types (of CSV columns), might help datapusher import data into datastore.