ai4er-cdt / geograph

GeoGraph provides a tool for analysing habitat fragmentation and related problems in landscape ecology. GeoGraph builds a geospatially referenced graph from land cover or field survey data and enables graph-based landscape ecology analysis as well as interactive visualizations.
https://geograph.readthedocs.io
MIT License
39 stars 10 forks source link

GeoGaph problem with identical attributes #29

Closed rdnfn closed 3 years ago

rdnfn commented 3 years ago

Description: When loading a GeoGraph from a dataframe that already contains some of the attributes that are automatically added to each node, the graph is not created because of a double key error.

Possible solution: Add all dataframe attributes as single dict node attribute (eg. df_attributes), that would avoid the possibility that the df and geograph internal keys match.

Reproducable test case:

from src.models import geograph
from src.data_loading import test_data

test_gdf = test_data.get_polygon_gdf("chernobyl_squares_touching")
test_gdf['class_label']=0
test_gdf
id geometry area class_label
0 0 POLYGON ((715639.122 5697662.734, 815639.122 5... 1.000000e+10 0
1 1 POLYGON ((815639.122 5697662.734, 915639.122 5... 1.000000e+10 0
graph = geograph.GeoGraph(test_gdf)
Step 1 of 2: Creating nodes and finding neighbours:   0%|          | 0/2 [00:00<?, ?it/s]

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-10-5c05de071a97> in <module>
----> 1 graph = geograph.GeoGraph(test_gdf)

~/repos/gtc-biodiversity/src/models/geograph.py in __init__(self, data, attributes, graph_save_path, raster_save_path, tolerance, **kwargs)
    136         # Load from dataframe
    137         elif isinstance(data, gpd.GeoDataFrame):
--> 138             self._rtree = self._load_from_dataframe(
    139                 data, attributes, tolerance=self.tolerance
    140             )

~/repos/gtc-biodiversity/src/models/geograph.py in _load_from_dataframe(self, df, attributes, tolerance)
    399             row_attributes = dict(zip(attributes, [row[attr] for attr in attributes]))
    400             # add each polygon as a node to the graph with all attributes
--> 401             self.graph.add_node(
    402                 index,
    403                 rep_point=polygon.representative_point(),

TypeError: add_node() got multiple values for keyword argument 'area'
herbiebradley commented 3 years ago

I think this is actually solved in the change I made today to store all the polygons in the dataframe, because I also removed the **attributes line when adding nodes (since there's not much point in adding them to the graph if we already have them in the dataframe. See https://github.com/ai4er-cdt/gtc-biodiversity/blob/feature/graph-analysis-habitat/src/models/geograph.py#L428, hopefully I will submit a PR with this tomorrow.

rdnfn commented 3 years ago

Nice, nevermind then!