CityofSantaMonica / mds-provider

Python tools for working with MDS Provider data
https://github.com/openmobilityfoundation/mobility-data-specification
MIT License
18 stars 20 forks source link

Keep the fake MDS data generator's trip endpoints inside the boundary #49

Closed krithin closed 5 years ago

krithin commented 5 years ago

As things stand now the fake data generator starts off by placing its simulated vehicles at points inside the provided boundary file; it then simulates them taking trips by letting them roam some distance from their current location.

The problem with that approach is that vehicles can end up traveling quite far from their starting locations: in particular, when overlaid with the boundary of the city of Santa Monica, for instance, there's nothing stopping bikes and scooters from ending up scattered across the ocean. That is rather unrealistic, so this PR attempts to fix that by requiring that the end point of every trip, like its start, stay within the boundary while attempting to keep the distance distribution minimally changed.

thekaveman commented 5 years ago

This is a great idea.

What if, instead of a new method, boundary is an optional parameter to point_nearby? If it is not provided, proceed as normal. If it is provided, your logic takes over.

krithin commented 5 years ago

What if, instead of a new method, boundary is an optional parameter to point_nearby? If it is not provided, proceed as normal. If it is provided, your logic takes over.

I tried this but think it actually turns out to be more confusing than helpful, especially since point_nearby_within itself depends on point_nearby(...bearing=None).

thekaveman commented 5 years ago

Recursive functions are OK, right? 😉

I think it can be as simple as:

def point_nearby(point, dist, bearing=None, boundary=None):

    # setup etc.

    if boundary is None:
        # existing code
    else:
        for _ in range(MAX_TRIES):
            end_point = point_nearby(point, dist, bearing=bearing)
            if boundary.contains(end_point):
                return end_point
        # If we got here it's possible there was no point at that exact distance
        # from our starting point within the boundary; or maybe we were just unlucky.
        # Shrink the distance to the endpoint until we find one inside the boundary.
        assert(boundary.contains(point))
        while not boundary.contains(end_point):
            dist = dist * 0.9
            end_point = point_nearby(point, dist, bearing=bearing)

        return end_point

The other comment/question I have: instead of that delayed assert for the boundary containing the starting, what are your thoughts on raising an exception with a meaningful message (along the lines of... "we tried, but the distance may be too great from the starting point to the boundary"). Or alternatively, return None here... to indicate there was not a point_nearby that starting point, at that distance, inside that boundary.

thekaveman commented 5 years ago

@krithin I would like to get this merged, can we make that minor change to reuse point_nearby?

And if you have any thoughts on my question above?

krithin commented 5 years ago

Hi! Sorry this slipped on my end, will take another look in a day or two!

thekaveman commented 5 years ago

ping @krithin

krithin commented 5 years ago

Thanks for waiting (and for the ping!) I've reorganized this a little along the lines of your comment, and added an exception with a more informative error message in place of the assertion. I didn't want to return None if we failed to find a suitable point though - that would have meant changing more of the logic inside device_trip in provider.py.

thekaveman commented 5 years ago

This is great, thanks for the reorg @krithin.

I cleaned things up a bit, biggest thing was I didn't want to introduce an error class just for this one situation. If it is warranted in the future for other cases, we can revisit.