UDST / pandana

Pandas Network Analysis by UrbanSim: fast accessibility metrics and shortest paths, using contraction hierarchies :world_map:
http://udst.github.io/pandana
GNU Affero General Public License v3.0
386 stars 84 forks source link

Documenting behavior for aggregations that can't be calculated #96

Open smmaurer opened 6 years ago

smmaurer commented 6 years ago

It looks like pandana.Network.aggregate() returns values of -1 for source nodes where an aggregation can't be calculated, for example if there aren't any other nodes within the distance radius. I can't find a reference to this in the documentation, though. We should confirm what the behavior is and make it more explicit.

Docstrings for pandana.Network.aggregate(): https://github.com/UDST/pandana/blob/master/pandana/network.py#L274-L320

Sphinx documentation: http://udst.github.io/pandana/network.html#pandana.network.Network.aggregate

There are several code conditions in the C++ that produce values of -1, but I haven't traced out the details: https://github.com/UDST/pandana/blob/master/src/accessibility.cpp

smmaurer commented 6 years ago

Related to this are the messages about dropped rows that sometimes show up when you run an aggregation calculation:

Computing pop_500_walk
Removed 189769 rows because they contain missing values

These messages are generated by the pandana.Network.set() call that links the values being aggregated to the network.

https://github.com/UDST/pandana/blob/master/pandana/network.py#L235

Here's what happens, for the example of aggregating a variable from the households table:

Often, the rows are dropped because they can't be matched to nodes (for example households that are not assigned to buildings and thus don't have a spatial location), not because of missing values in the data column.

Rows that are explicitly filtered out aren't counted, which can result in variations in the number of rows dropped for aggregations in the same table.

Here is a notebook where we dug into this: More-aggregation-troubleshooting.ipynb