TheTransitClock / transitime

TheTransitClock real-time transit information system
GNU General Public License v3.0
79 stars 30 forks source link

A framework to improve dwell time predictions #236

Open simonberrebi opened 3 years ago

simonberrebi commented 3 years ago

Kalman Filter for Dwell Times

This framework describes the steps to improve dwell time prediction accuracy in TheTransitClock. The assumptions and trade-offs built into the Kalman Filter for Dwell Times introduced in Farhan and Shalaby (2004) are discussed. Relevant studies regarding the impact of these decisions on prediction accuracy are cited.

Introduction

Predicting the arrival time of transit vehicles requires estimating the successive travel and dwell times that make up the trip. In the latest version of TheTransitClock, travel times are estimated with a Kalman Filter, which calibrates historical data inputs from the last week with immediate data from the last vehicle. TheTransitClock currently uses the Historical Average method to predict dwell times without considering the headway. However, it is known that on high-frequency routes, boarding times tend to increase with longer headways as more passengers have time to arrive at stops Levinson (1983). Farhan and Shalaby (2004) introduced the Kalman Filter for travel and dwell times, which calculates passenger arrival rates in real-time based on historical and immediate data.

The flowchart below describes the dwell time component of their method. Using Automated Vehicle Location (AVL) and Automated Passenger Count (APC) data, historical and real-time passenger arrival rates are estimated. The Kalman Filter weighs the two inputs based on the live error to produce a calibrated passenger arrival rate. This rate is multiplied by the vehicle’s headway to obtain a predicted dwell time. Finally, the dwell and travel time estimates are added to predict the arrival time. The assumptions built into each step, from the prediction back to the data, are described below.

image

Prediction

The capacity to improve prediction accuracy with a more sophisticated dwell time model hinges on the modeling assumptions highlighted above. While some studies, such as Hans et al. (2015), have found a significant improvement in prediction accuracy when incorporating a dwell time model, others have not. For example, Cats and Loutos (2016) compared a Historical Average prediction method with the Headway-Based dwell time model. The authors found no improvement in prediction accuracy, possibly due to the assumptions that passenger arrivals at stops were Poisson distributed and that no other factors affected dwell times.

Recommendation - Quantify the proportion of prediction error that is due to dwell times. This new benchmark will then be used to debug and test different model configurations in playback mode.

Run Time Model

The Run Time Model is composed of a travel time and a dwell time model. Dwell time can be split into several components:

Recommendation - Ensure that the different components of dwell times are accounted for in the prediction algorithm.

Kalman Filter

The Kalman Filter algorithm for travel time trades-off the stability of historical data from the last week for the immediacy of real-time data from the last vehicle. When a disruption happens, data from the immediate past can better represent current operating conditions even if based on a sample size of one.

In the Kalman Filter for dwell times, an added step consists in dividing the passenger boardings by the headway of the last vehicle at the current stop. If that headway is short, then a few passenger boardings can have an outsize effect on the estimated passenger arrival rate. This is likely to happen due to the inherent variability and potential biases of the Poisson model. Under the assumption of Poisson arrivals, the variance of this estimate is equal to its mean, λ. Furthermore, if the Poisson assumption is violated, the immediate data may produce a biased estimate. Consider, for example, passengers boarding the second of two bunched vehicles to avoid crowding. The Kalman Filter, which gives more weight to the immediate data when it differs from historical data, could then pass on the biases into the calibration and prediction.

Recommendation - Instead of estimating the passenger arrival rates based on a single headway, the total passenger boardings could be divided by a longer horizon encompassing several successive headways. Analysis of autocorrelation in passenger rates would be required to find a horizon making the optimal trade-off. A headway too short may be unstable while a headway too long may no longer represent current operating conditions.

Arrival Rate

In Farhan and Shalaby (2004), passenger arrival rates at a particular stop are calculated as the number of passenger boardings divided by the headway. The method assumes passengers arrive at stops independently of the schedule according to a Poisson process. This assumption was found to hold true by Fan and Machemehl (2009) when headways are less than 12-minutes. Farhan and Sahalaby (2004) tested their method during peak-hour on a route with 12-minute headways. However, on low-frequency routes and during off-peak hours, when passengers tend to coordinate their arrivals with the schedule, the method may not be adapted. Furthermore, with the availability of real-time information, passengers are even less likely to arrive randomly at stops (Watkins et al., 2011). Empirical studies have shown that on routes with longer headways, passenger arrivals tend to follow a beta distribution (Ingvardson et al., 2018).

Recommendation - Test different models of passenger arrival rates based on headways and times-of-day.

Data

Recommendation - The components described in this framework can be implemented sequentially. In the first stage, the dwell time for the previous vehicle could be used to infer the passenger arrival rate. The components and interfaces built in the first stage, including the caching system, and the models for dwell time and passenger arrival rates could be leveraged in the second stage once real-time APC data become available.

References

scrudden commented 3 years ago

Benchmark tool to record the proportion of prediction error that is due to dwell times is implemented in pull request #226.