NREL / routee-compass

The RouteE-Compass energy-aware routing engine
https://nrel.github.io/routee-compass/
BSD 3-Clause "New" or "Revised" License
9 stars 4 forks source link

Normalize GeoPandas support #200

Open robfitzgerald opened 1 month ago

robfitzgerald commented 1 month ago

The result of a CompassApp::run is a Python Object which has some standard keys and structures that are known/specified internally in the rust codebase. Handling this monolithic object in Python is counter-intuitive since we are not leveraging Pandas/GeoPandas, which are the standard format for analysis of columnar (geo) data.

The CompassApp::run method could include a geopandas: bool = false (or gdf if that's more Pythonic) argument which will assume the user wants GeoDataFrame outputs for route and tree data when it is true, but output the monolithic Object when false (by default).

robfitzgerald commented 1 month ago

here's some thoughts on how we could do this.

I'd propose the signature for run gets expanded to support a few different combinations.

class CompassApp:

  def run(self, queries, gdf: bool = false, index_col: Optional[str]=None)

Top-level structure

In my mind a route is mapped to a single row of a dataframe. I think for a tree, a link is mapped to a row. Because their schemas are different, I think we want the output of a gpd=true run to return:

This is similar to osmnx.to_gdfs().

Returning Routes

Summary row data

Any row in a route GeoDataFrame should have all traversal and cost columns that may be relevant to post-processing:

column mapping
_index (user-defined index_col path, or, by default, the top-level index value in the Compass output)
distance traversal_summary.distance
time traversal_summary.time
...
distance_cost cost.distance
time_cost cost.time
total_cost cost.total_cost

All of these should be set only if they exist (optional semantics).

Geo row identifiers

The route geometries are parsed as geometry objects. A _path column is added to the dataframe in the case that there is more than one path. The value is the route.path index.

run() result description route column?
one route {} route.path has one entry no
n routes [] route.path has i entries append _path column with value i

As a result, a row is uniquely indexed by the _index column when there is one route per row, and a combination of _index and _path when route.path has more than one entry. Put another way: we end up with k * q rows for q queries and k paths.

Geometry data

We need to cover the possible geometry output types. We can know in advance what the geometry type is by inspecting the configuration of the CompassApp (via #201). We can then switch our geometry parsing method accordingly:

Compass traversal.route config value parse method for string s
json (no geometry)
wkt shapely.wkt.loads(s)
geojson shapely.ops.shape(s)

Returning trees

Trees are similar but different. Each Compass output row may have an entire tree, but, we do not want a GDF of trees (plural), we want instead to plot a single tree. Following the logic above, we can add one more index column so that each row is a link in a tree in a result:


for each Compass run result row _i_:
  for each tree _j_:
    for each tree branch _k_:
      create row for `_index` _i_, `_tree` _k_,  `_edge_id` edgeid(_k_)
robfitzgerald commented 1 month ago

btw, if we do the proposed above, then i feel like all of our plotting functions can disappear and get replaced by GeoDataFrame plotting methods, which for example supports the feature described in #199, but also can swap-in replace the existing to_plot_folium methods in compass-app-py.

robfitzgerald commented 1 month ago

Just thinking about it. Maybe instead of gdf: bool we instead provide output_format: str where it can be "json" or "geopandas"?