cagov / caldata-mdsa-caltrans-pems

CalData's MDSA project with Caltrans on Performance Measurement System (PeMS) data
https://cagov.github.io/caldata-mdsa-caltrans-pems/
MIT License
2 stars 0 forks source link

Configuration Diagnostics (Direction Switching) #224

Open kengodleskidot opened 3 weeks ago

kengodleskidot commented 3 weeks ago

In the previous technical committee meeting (5/30/24) one of the members brough up this topic which is discussed in more detail below:

The main reason for this configuration diagnostic is to identify detectors that have incorrect direction configuration information. There are a number of cases where the detector is on the opposite side of the freeway which is in conflict with the configuration information provided by the district. The algorithm described below attempts to automatically identify detectors which are on the opposite side of the road. There may be an opportunity to use geospatial functions to perform this diagnostic so I recommend further discussion on what geospatial tools may be available before diving into creating a model using the algorithm described below:

Each detector in the system is assigned a score between 0 and 100, with zero meaning that it's highly likely that this detector is on the opposite side of the freeway. Since a high number indicates that the detector is very similar to its neighbors in the same direction, this score is usually referred to as Neighbor Affinity.

Conceptually we are looking for detectors that have an hourly flow pattern over a day that doesn't look like the hourly flow pattern from detectors that are upstream and downstream in the same direction. If a pattern is found for a particular detector that more closely matches the detectors on the opposite side of the freeway then a low score is assigned. The details of assigning a score to one particular "target" detector are as follows:

The algorithm is run on a weekday when the flow profiles have strong AM and PM peak direction patterns. Only data during the hours of 5am until 10pm is used (at 3am the flow is similar everywhere). For the target detector all of its neighboring detectors on both sides of the freeway are selected for 5 miles in each direction. Only detectors diagnosed as GOOD are used. For the purposes of this calculation, in order for a detector to be GOOD we require that

  1. More than 75% of the individual lane detectors have to be judged good by the daily diagnostic routines
  2. None of the 5-minute flows for any of the lanes during the time period were purely imputed, and
  3. At least 80% of the resulting hourly aggregate data points are good (not imputed).

For each neighbor that has been judged to be good the distance between the target and the neighbor is determined. For distance the mean of the absolute differences between the normalized hourly flow patterns of the two detectors for the day is used. Note that this is done on the aggregate flow across all lanes and that this test is only performed for mainline (ML) detectors.

For the top 10 closest neighbors, as computed by the distance in the preceding step, we count the number of detectors that are in the same direction and we call that K. The resulting score for this detector is just K / 10. If the top 10 closest neighbors are all on the other side of the freeway then the resulting score will be zero. This is then a pretty good indication that the detector is switched.

If we don't have 10 neighbors with good data then we use whatever we have. In some cases, we might not have any neighbors with good data. In that case we don't compute a score at all. Currently this direction switching algorithm in PeMS is done every week on Tuesday night. If the data feed is down for that particular day, then it's possible that we won't be able to compute any scores for any detectors in the system.

kengodleskidot commented 2 weeks ago

The GIS data mentioned in #233 could offer a solution to handling this configuration diagnostic issue using geospatial tools.