Closed arky closed 4 years ago
Ahoy! This is a great suggestion.
Let's tackle use cases piecemeal, in separate bugs.
[technical details follow....]
This is going to get very complicated. I'll explain why:
When it comes to putting GeoPandas on a web server, lots of things simple tasks become complex. For instance:
Geocoding: judging from the geocode() docs, geocoding uses an external service. On Workbench, all Workbench users would share the same quota. (Individual modules can prompt for API tokens. But that greatly detracts from the "let's tinker" mentality in Workbench: the Python module is one place where we recommend users never place tokens -- for the same reasons GitHub tells users not to store API tokens in their git repositories.)
I think Workbench deserves a geocoding service (and a reverse geocoding service). I don't think GeoPandas will work for this. Sad trombone.
GeoDataFrame: The GeoPandas GeoDataFrame will never be an input of a Workbench module, because Workbench rejects Pandas metadata between steps. (Not all steps are implemented in Pandas.) It might be a mistake to allow a GeoDataFrame as output from a step, too, because that would imply to users that the input to the next step would be the same GeoDataFrame they output in the previous step -- though it never will be.
Omitting GeoDataFrame shouldn't be a big deal: as I understand it, GeoDataFrame is syntactic sugar. Its sole purpose is to reduce the number of bytes of code in online tutorials. But in a sense, this is a big deal: even though GeoDataFrame does nothing, online tutorials use GeoDataFrame! If Workbench doesn't coexist with GeoDataFrame, all those online tutorials won't work. Another sad trombone.
Plotting: In general, Pandas charting libraries don't work in Workbench. We'd like to build mapping solutions. (Indeed, we have prototypes.) GeoPandas won't help us achieve that. Sad trombone.
Data types: Workbench currently has three data types: "Text", "Number" and "Date & Time". Every value of every type is specified down to the byte level in three formats: inter-process (Arrow); on-disk (Parquet, CSV, JSON); and Pandas.
We'll want at least one "geo data" type. What should its byte-level representations be? Today, from what I can tell, there's no industry standard. The closest thing I can find to a spec is a big question mark by the geopandas folks: https://github.com/geopandas/geo-arrow-spec. Yet another sad trombone.
(We could specify it as WKB ("well-known binary" notation) ... but is that an efficient means of passing spatial data over the wire? I'd hate for Workbench to choose the wrong technique, only to see the a standard arise next year that fits our needs better but breaks our contract with module authors.)
All that to say: Yes, it's possible to install the geopandas
Python module and tell our users to:
Handle errors themselves
... but I think that will cause more problems than it solves.
[okay, done with the detail]
@arky Could you please provide user stories -- tasks you think Workbench should be able to do? (Please avoid the Python Code module when writing up these tasks.)
@adamhooper Thank you for taking time to explain the internals of WKB. I understand the challenges involved with relation to both Geo-spaital and charting. Feel free to close this issue if needed.
As I have introduced WKB as defacto online tool for some data processing for my work in SE. Asia. I'll be filing use cases as separate bugs avoid python code module.
I look forward to the future issues! I'll close this one because it's not clear what we can do to declare it "complete".
Wondering if it is possible to include 'GeoPandas' packages available in the python script workflow module.
I believe this might attract Geo-spatial folks to use CJ Workbench.
My simple use case for GeoPandas is you could simple action as validating if a column of coordinates are within a given country. At the moment, you have to make multiple call to external reverse geocoding systems to do such verification. With GeoPandas modules it could be done in couple of lines.
Added bonus is GeoPandas could generate its own maps that could perhaps be rendered in reports section.