Define a nearestSegment() function that takes a city and a lat/lng point as inputs, and returns a name and unique id for nearest street segment to that point as well as unique ids for the segment's enclosing intersections

Streets-Data-Collaborative / geo-street-talk-global

conversational on-street locations

4 stars 0 forks source link

Define a nearestSegment() function that takes a city and a lat/lng point as inputs, and returns a name and unique id for nearest street segment to that point as well as unique ids for the segment's enclosing intersections #2

Open dmarulli opened 6 years ago

dmarulli commented 6 years ago

The python libraries OSMnx and geopandas will be helpful here.

This method can pull a street grid given the name of a city
This method can output a shapefile for the street grid that can be read in as a geopandas geodataframe for the "find nearest segment" spatial query
the osmid is the unique id of each street segment

YukunVVan commented 6 years ago

Should we include shapefile downloading in this function or in function streetTalk()? (Maybe in streetTalk(), put this downloading part at the beginning of the function )
In OSMnx package, there is a function called "graph_from_point", which takes lat/lon and distance as parameters. It will return a much smaller graph. Is it worthy to use this one instead of "graph_from_place"? We can definitely download the map for the whole city and check whether the map exists each time when running streetTalk(). (Is there any other way to do this without downloading the shapefile? )
When we try to get the map from OSMnx, which network types should be set? Drive? Bike? Walk? Or all the types?

dmarulli commented 6 years ago

Hey Yukun - Good questions.

The downloading routine can go in either.
So definitely feel free to do a little exploratory work (keeping in mind the 01/22 launch), but here is my thinking: the initial intention with this streetTalk() function is to enrich existing datasets of locations. So, at least in the SQUID case, downloading the entire grid once may make more sense since there will be plenty of situations in which there will be many points on the same street segment (and it may be excessive to be sending off requests for every point). One could imagine doing some clever cacheing and checking with each request, but that would almost certainly be over-engineering at this point.

After developing this tool for datasets, we can address the best way to scale this down for smaller, more ad hoc queries.

Let's let network type simply be an additional parameter.

dmarulli commented 6 years ago

Hey @YukunVVan - just let me know if I addressed your questions, otherwise I can say more.

YukunVVan commented 6 years ago

@dmarulli Thanks for explanation! One more question: Which kind of format is needed for the code? .ipynb or .py? One file for each function or for all the functions?

dmarulli commented 6 years ago

A single .py file should be good.

YukunVVan commented 6 years ago

For this function, I assume that the inputs are a Point and geodataframe of edges of the city map. So the function should be as below:

def nearestSegment(point,city):
    '''
    point: shapely.geometry.Point (longitude,latitude)
    city: geodataframe of edges of segments in city
    return:nearest_name, nearest_id, from_id, to_id
    '''
    idx = city.geometry.distance(point).sort_values().index[0]
    nearest_name = city.loc[idx,'name']
    nearest_id = city.loc[idx,'osmid']
    from_id = city.loc[idx,'from']
    to_id = city.loc[idx,'to']
    return nearest_name, nearest_id, from_id, to_id

What is the format of the point? String or shapely.geometry.Point? Or in two numbers of longitude and latitude?

dmarulli commented 6 years ago

My intuition here is that explicitly inputting longitude and latitude (as opposed some data structure like shapely.geometry.Point) may be more versatile in the end.

I have pushed up a CSV with some test data from Los Angeles. Using the streetTalk() function to add a conversational string to each record may be helpful for testing.

dmarulli commented 6 years ago

How are things coming along @YukunVVan --do you need anything from our end?

YukunVVan commented 6 years ago

@dmarulli Hi David, I roughly complete the coding work and can pass most of test data(.py and test process are uploaded for reference). However during the test, there are some special points and roads, making our assumption not work.

For example: (-118.188637,34.088978) Nearest street: Armour Avenue intersection of from_id: ['Armour Avenue'] intersection of to_id: ['Armour Avenue' 'Cassatt Street' 'Monterey Road']

Here, Armour Avenue and Monterey Road are not straight. How should we deal with this point? How to describe the location of this point?

dmarulli commented 6 years ago

Until we come up with a more elegant solution, let's just set up a rule for these seemly edge cases that instead simply prints out the street segment name.

patwater commented 6 years ago

Can we close this?