cuappdev / ithaca-transit-backend

An open-sourced backend service for Ithaca Transit.
MIT License
15 stars 1 forks source link

Analytics: Use selected routes to improve ordering #62

Open nateschickler0 opened 6 years ago

nateschickler0 commented 6 years ago

Currently we only order routes by arrival time, but we know that the fastest route isn't always the most selected. We have found that users prefer not to transfer, walk less, etc and will select slower routes because of this. Once we have a solid amount of data logged, we should leverage this to automatically re-order the route suggestions generated by graphhopper.

mattbarker016 commented 6 years ago

We need front-end to implement which routes are selected. However, we need some infrastructure to communicate this data to the backend. @zeladada, @AAAstorga, and I talked previous about creating a database that would keep track of the index of each route within the array of routes passed as data. We could then use an identifier matching a getRoutes request and this index to pinpoint the route selected in the set. (Would the request URL itself serve as suitable identifier? Will the same routes always appear if you load a URL request for getRoutes in the past?) Feel free to suggest a better mechanism to pass this data from front to back end!

There's a few decisions we need to make to actually process this data. We should try and link routes across multiple time points, but it's not trivial to match that up.

Example: Within a minute of each other, User A starts at Sage Hall and wants to go to Panera. User B starts at Statler Hall and wants to go to Chipotle. Ultimately, they will (likely) take the same bus to get there, but the exact details of the route...

... will be incredibly similar but ultimately different.

With this in mind, we need a way to relate bus routes to one another. A prime candidate would be route numbers, maybe even potentially bus stops. However, location and time is important to an extent too.

In that above example, if both users pick the same route option, we'd want to log that Bus A -> Bus B as the best way to get from one area to another. However, area is purposefully ambiguous: both users are going to the same area, but not the same specific place. How do we define areas people are trying to get to?

Or, is this the wrong question to be asking? My point is, there's a lot of uncertainty here. Whatever we decide, we should try and make simple hypotheses at first and test them accordingly.