Open robfitzgerald opened 1 month ago
here's some thoughts on how we could do this.
I'd propose the signature for run
gets expanded to support a few different combinations.
class CompassApp:
def run(self, queries, gdf: bool = false, index_col: Optional[str]=None)
In my mind a route is mapped to a single row of a dataframe. I think for a tree, a link is mapped to a row. Because their schemas are different, I think we want the output of a gpd=true
run to return:
GeoDataFrame
when either routes or trees are found (but not both)Tuple[GeoDataFrame, GeoDataFrame]
when both are presentThis is similar to osmnx.to_gdfs()
.
Any row in a route GeoDataFrame should have all traversal and cost columns that may be relevant to post-processing:
column | mapping |
---|---|
_index | (user-defined index_col path, or, by default, the top-level index value in the Compass output) |
distance | traversal_summary.distance |
time | traversal_summary.time |
... | |
distance_cost | cost.distance |
time_cost | cost.time |
total_cost | cost.total_cost |
All of these should be set only if they exist (optional semantics).
The route geometries are parsed as geometry objects. A _path
column is added to the dataframe in the case that there is more than one path. The value is the route.path
index.
run() result | description | route column? |
---|---|---|
one route {} |
route.path has one entry |
no |
n routes [] |
route.path has i entries |
append _path column with value i |
As a result, a row is uniquely indexed by the _index column when there is one route per row, and a combination of _index and _path when route.path has more than one entry. Put another way: we end up with k * q rows for q queries and k paths.
We need to cover the possible geometry output types. We can know in advance what the geometry type is by inspecting the configuration of the CompassApp (via #201). We can then switch our geometry parsing method accordingly:
Compass traversal.route config value |
parse method for string s |
---|---|
json | (no geometry) |
wkt | shapely.wkt.loads(s) |
geojson | shapely.ops.shape(s) |
Trees are similar but different. Each Compass output row may have an entire tree, but, we do not want a GDF of trees (plural), we want instead to plot a single tree. Following the logic above, we can add one more index column so that each row is a link in a tree in a result:
for each Compass run result row _i_:
for each tree _j_:
for each tree branch _k_:
create row for `_index` _i_, `_tree` _k_, `_edge_id` edgeid(_k_)
btw, if we do the proposed above, then i feel like all of our plotting functions can disappear and get replaced by GeoDataFrame plotting methods, which for example supports the feature described in #199, but also can swap-in replace the existing to_plot_folium
methods in compass-app-py.
Just thinking about it. Maybe instead of gdf: bool we instead provide output_format
: str where it can be "json" or "geopandas"?
The result of a
CompassApp::run
is a Python Object which has some standard keys and structures that are known/specified internally in the rust codebase. Handling this monolithic object in Python is counter-intuitive since we are not leveraging Pandas/GeoPandas, which are the standard format for analysis of columnar (geo) data.The
CompassApp::run
method could include ageopandas: bool = false
(orgdf
if that's more Pythonic) argument which will assume the user wants GeoDataFrame outputs for route and tree data when it istrue
, but output the monolithic Object whenfalse
(by default).