UDST / pandana

Pandas Network Analysis by UrbanSim: fast accessibility metrics and shortest paths, using contraction hierarchies :world_map:
http://udst.github.io/pandana
GNU Affero General Public License v3.0
386 stars 84 forks source link

Automatically cast node values to float #108

Closed cvanoli closed 5 years ago

cvanoli commented 5 years ago

Related to #105 the df[name] column passed to initialize_access_var is forced to be float type because currently, Pandana can only perform an accessibility calculation on a network using a table where the variable being aggregated is type Float.

smmaurer commented 5 years ago

Great! Looks to me like this should work. I'm tagging @federicofernandez and @mxndrwgrdnr for more feedback.

Here are some things we should take care of before merging this:

  1. To get the current tests passing, we'll have to merge the change from PR #102 into this branch. This should do it:
git checkout master
git pull
git checkout data_type_conversion_for_net
git merge master
git push
  1. We should add a unit test confirming it's possible to calculate an aggregation from integer data now. It could go right after this one: test_pandana.py#L84.

  2. Update docstrings to explain that both ints and floats are allowed, but data will be cast to float before the aggregations are calculated: network.py#L198.

federicofernandez commented 5 years ago

Very good suggestions.

On Mon, Apr 15, 2019 at 9:36 PM Sam Maurer notifications@github.com wrote:

Great! Looks to me like this should work. I'm tagging @federicofernandez https://github.com/federicofernandez and @mxndrwgrdnr https://github.com/mxndrwgrdnr for more feedback.

Here are some things we should take care of before merging this:

  1. To get the current tests passing, we'll have to merge the change from PR #102 https://github.com/UDST/pandana/pull/102 into this branch. This should do it:

git checkout master git pull git checkout data_type_conversion_for_net git merge master git push

1.

We should add a unit test confirming it's possible to calculate an aggregation from integer data now. It could go right after this one: test_pandana.py#L84 https://github.com/UDST/pandana/blob/master/pandana/tests/test_pandana.py#L84 . 2.

Update docstrings to explain that both ints and floats are allowed, but data will be cast to float before the aggregations are calculated: network.py#L198 https://github.com/UDST/pandana/blob/master/pandana/network.py#L198.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/UDST/pandana/pull/108#issuecomment-483467667, or mute the thread https://github.com/notifications/unsubscribe-auth/ABx17fQjmAIHhELazJaq5EMRIT_x7co6ks5vhRr6gaJpZM4cwtws .

smmaurer commented 5 years ago

@cvanoli Now that the fix from the master branch is merged in, the tests are getting further -- but it turns out that 'float32' doesn't work as the dtype. It looks like the C++ code wants doubles.

Earlier in the same file when the network is initialized, the edge weights are converted using .astype('double'), so that will probably work for the node values too. network.py#L81-L87

Not sure why the values didn't need to be cast previously -- probably the Pandas default dtype for floats is something that maps automatically to doubles, but 'float32' doesn't.

https://www.numpy.org/devdocs/user/basics.types.html

Also, here's another place in the docstrings where we should specify that ints are now acceptable: network.py#L38

smmaurer commented 5 years ago

See https://github.com/UDST/pandana/issues/105#issuecomment-484227344 for more perspective on this.

coveralls commented 5 years ago

Coverage Status

Coverage increased (+0.2%) to 91.089% when pulling 02d4513651273ba05649917f599d993147599e08 on data_type_conversion_for_net into bf92beef8ed6dc19810630bbe2be279db614b314 on master.

smmaurer commented 5 years ago

Ok, final summary of what's included here:

This PR automatically casts node values to floats (technically, float64/double), in order to support aggregations of integer values. Edge weights are already cast this way.

This restores behavior from Pandana v0.3. I've adjusted the docstrings to officially support node values and edge weights with any numeric type.

Existing tests are passing now. I'm going to merge this after adding a test where node values are non-float.