ckan / ideas

[DEPRECATED] Use the main CKAN repo Discussions instead:
https://github.com/ckan/ckan/discussions
40 stars 2 forks source link

Geo-enable datastore/datapusher #156

Open rossjones opened 9 years ago

rossjones commented 9 years ago

Suggested via https://twitter.com/timwis/status/631068707955961861

"Do you have any plans to add geospatial queries to DataStore API? I don't see it on the roadmap and surprised no one's asked"

Datastore

A possible starting point (and an assumption) is that if the datastore db is postgis, that datastore_search_sql will 'just work'. The other action methods would probably need further work (to ensure insertions etc).

DataPusher

The real problem is likely the type-guessing in datapusher, it needs to be able to determine that the float it has just found is a lat, or that the string it has is geojson (for example). Perhaps as a first pass this could be done based on convention of column headers? It's a bit fragile but would simplify a first version.

Possibly related to #151

rufuspollock commented 9 years ago

I also note that I'm pretty sure Natural History Museum already did most of this work for their deployment.

timwis commented 9 years ago

Guessing based on column headers is how Esri does it in koop. On it "just working," keep in mind that PostGIS can output geometry in a variety of ways, so just including a geom field in your SELECT query will give you binary geometry output. You have to wrap it in ST_AsGeoJSON(geom)::json to get it to output GeoJSON, which would probably make the most sense. If it helps, here's how I did it in my SODA API implementation - it checks if any of the fields in the SELECT statement are of type geometry and then wraps them in the above format.

jqnatividad commented 7 years ago

Pinging this issue again, especially in light of the recent major datastore developments (semi-auto data dictionary; pgloader; download in alternate formats; performance improvements; triggers, etc.) and the CDO/CIO's Open Letter to the Open Data Community where they requested that geospatial data be treated as a first class data type.

We did a prototype implementation of async geocoding using background queue, using an extras metadata field to capture the state of the async geocoding job (request geocoding, being geocoded, geocoded) and then creating the geocoded file as a separate resource.

Perhaps, the community can collaborate on a geo-enabled datastore after the latest batch of datastore enhancements. This will be another big installment to move CKAN beyond "data catalog" type workloads to "data-as-infrastructure" workloads cc @amercader @wardi @davidread

mattfullerton commented 6 years ago

https://github.com/NaturalHistoryMuseum/ckanext-dataspatial (geospatial searches; I think it assumes you have your geodata in the tables) https://github.com/derilinx/ckanext-vectorstorer / https://github.com/PublicaMundi/ckanext-vectorstorer (get GeoJSON and Shape files in via GeoServer)

The big question is how the workflow looks for getting the data in. CKAN's support for importing XLS/CSV etc. into a database table is good but not perfect. Really slick importing of many different Geo formats is a big task. The Vectorstorer extension does a good job but it could use more work and I'm not sure GeoServer is the best solution just for imports (we can call GDAL directly instead) unless you want the other benefits of GeoServer (tile previews, dynamic conversions to other formats).

timwis commented 6 years ago

What about using the-el? It's a command-line tool to extract and load SQL tables using a JSON Table Schema. It wraps a fork of a frictionless data tool to add geospatial, carto, and oracle support. So it could be used as a python library alternatively (or the underlying thing it forks at least). /cc @awm33