UDST / pandana

Pandas Network Analysis by UrbanSim: fast accessibility metrics and shortest paths, using contraction hierarchies :world_map:

http://udst.github.io/pandana

GNU Affero General Public License v3.0

386 stars 83 forks source link

Extracting node IDs in distances dataframe output by network.nearest_pois() #46

Closed nuripurswani closed 7 years ago

nuripurswani commented 9 years ago

I have applied the .nearest_pois function to find out how well connected nodes within a network are in relation to their nearest hospital. I got positive results so far and stored the results in a data frame and .csv. However, for every poi (1-10) with an output distance, I am also interested in retrieving the ID of the hospital that this corresponds to in the cases where it found a point located within my threshold distance of 2000 m (See relevant bits of my code below). I.e. if this is my current output (data frame):

node id 1 2 3 [dist] 600 800 2000

I'd like a corresponding output (data frame) with: node id 1 2 3 hosp1 hosp2 N/A ..

Would you be able to help? I looked at the function definitions in the network.py file to identify if there is another function or local variable within an existing function that returns such output.

Relevant bits of the code:

Initialise the network with the point of interest queries:

network.init_pois(num_categories=1,max_dist=2000,max_pois=10)

network.set_pois('hospitals',nodes['lon'],nodes['lat'])

Perform point of interest queries - I would like to obtain the node_ids

df = network.nearest_pois(2000,"hospitals",num_pois=10)

nuripurswani commented 9 years ago

To simplify the wording of my question using the tutorial as an example: I am able to obtain the distances of every node in the network to its 10 nearest restaurants given a cut-off radius. However, I'd like to find out the names/ids of those 10 nearest restaurants and store the result next to every node in the network along with the distances. Ideally, I'd like to store in a data frame.

jiffyclub commented 9 years ago

I don't know of a way to do this. @fscottfoti?

fscottfoti commented 9 years ago

Had to read some C++ code to answer this one... In fact, it is possible in the underlying C library, just not in Python at this time. I will see if I can carve out some time to add this feature.

nuripurswani commented 9 years ago

Thanks for your swift responses.

The set_pois function takes in the 'lat' and 'long' of the (hospital/restaurant) nodes as inputs so perhaps there's a way of attaching the ids of these nodes as an output to the nearest_pois query? I presume that the points utilised to calculate distances (shortest to 10th shortest) for each node in the network get stored somewhere in a local variable or is this underlying code written in C++?

I have also been looking at other methods to find out which points fall within a buffer radius (QGIS, other python libraries, etc) although these would be based on euclidian distance queries as opposed to focusing on the functionalities of network analysis in pandana.

fscottfoti commented 9 years ago

@nuripurswani are you comfortable with git and branches? I mean, if I check in a quick fix for this and leave on a branch for a while would you be comfortable to switch to the branch and "python setup.py install"?

fscottfoti commented 9 years ago

@nuripurswani I think this branch will work for you https://github.com/UDST/pandana/pull/47

The pull request describes how to use it - to get the code

git clone https://github.com/UDST/pandana.git
git checkout which-poi
python setup.py develop

nuripurswani commented 8 years ago

Hi, I tested the new branch today. It works!!! Thank you so much for all your help - it's exactly what I needed.

I have another quick question to ask on the nearest_pois query: When it calculates the 10 shortest paths, I assume that the distances are computed by developing node lists and summing the distances from node to node stored in the network structure until we get to the point of interest or hospital.

Are the node ids, or more importantly, the way ids utilised for calculating distances/impedances also stored locally inside the function?

Many thanks, Nuri

fscottfoti commented 8 years ago

Right so you just want the shortest path from point to point? Yes, that's a method that's available but hasn't been exposed in Python yet either. I might be able to add that as well at some point soon. Is it critical for your application?

fscottfoti commented 8 years ago

I will also merge that branch so you can now get the code in master.

nuripurswani commented 8 years ago

@fscottfoti - thanks so much for your help. Yes- this is correct. The idea would be to have an output that's quite similar to what you've already created for the poi ids as follows (taking the hospital example):

1         2

id dist1 dist2 1 2 id hosp1 hosp2 1 2 id [osm_node_list1] [nodelist_2] 1 2 id [osm_way_id_list 1] [osm_way_id_list2]

It's fairly important for the application in order to match the node id/way ids to other parameters such as the road surface or average speed of travel. However, knowing the IDs of the pois is a great step forward :) Many thanks!!

nuripurswani commented 8 years ago

So to clarify again- since we have the distances, having the node lists used to compute that distance or shortest - nth shortest path is helpful. The way ids are less redundant as many nodes can be part of the same way but the nodes are OK by themselves.

fscottfoti commented 8 years ago

OK, so this is the easiest way for me to add it - there's a new method to return the shortest path between two node ids (I can also imagine doing a dataframe of queries and returning a dataframe of routes, but this is the first step)

https://github.com/UDST/pandana/pull/48

As a reminder, you can get node ids from xys if you need to do that, here:

https://github.com/UDST/pandana/blob/master/pandana/network.py#L340

nuripurswani commented 8 years ago

Excellent- are the functions in issue 48 which you listed already in a new git branch? I couldn't find them in the source code so I assume they are new? Happy to contribute by creating a function that will store the results in a data frame if this is of interest?

nuripurswani commented 8 years ago

PS. I apologise for the basic question- I'm new to git branches

fscottfoti commented 8 years ago

No problem. Yes the branch is called single-shortest-path.

Not sure about the dataframe. The problem is the paths will all be different lengths, but I suppose leaving nans in the dataframe and having as many columns as the longest path or something like that could make sense. Not sure there's a huge advantage to though. I cant think of a dataframe op I'd want to run on it?? Maybe worth it just for to_csv?

On Fri, Nov 13, 2015, 11:04 PM Nuri notifications@github.com wrote:

PS. I apologise for the basic question- I'm new to git branches

— Reply to this email directly or view it on GitHub https://github.com/UDST/pandana/issues/46#issuecomment-156657511.

fscottfoti commented 7 years ago

This was fixed before not sure why I didn't close it