cagov / caldata-mdsa-caltrans-pems

CalData's MDSA project with Caltrans on Performance Measurement System (PeMS) data
https://cagov.github.io/caldata-mdsa-caltrans-pems/
MIT License
7 stars 0 forks source link

Imputation: cluster medians #172

Closed ian-r-rose closed 3 months ago

ian-r-rose commented 6 months ago

As described in the PeMS User Guide:

image

I have no idea what this means, let's discuss @kengodleskidot

kengodleskidot commented 5 months ago

@ian-r-rose here is some additional information from the website for reference (https://pems.dot.ca.gov/?dnode=Help&content=help_calc#impute): Cluster medians. This is the last method of imputation. A cluster is a group of detector stations that have the same macroscopic flow and occupancy patterns over the course of a typical week. For example, a possible cluster could be all of the detectors that are monitoring a freeway headed into the downtown area of a city. These detectors would all have similar macroscopic patterns in that the AM and PM peak directions would be similar from commuting traffic. The detectors on the opposite side of the freeway would presumably belong to a different cluster. We create a number of clusters (8-10) in each district using hierarchical clustering. We use the joint vector of aggregate hourly flow and occupancy of each detector station as input. We only use detectors that are reporting good data for the input. Once the clusters have been created, we calculate the median time-of-day and day-of-week profile for the aggregate flow, occupancy and speed. All of the broken detectors that weren't included in the clustering algorithm, or new detectors that are added to the system after the algorithm has been run, are assigned to the cluster of their nearest spatial neighbor.

If all of the other imputation methods fail, then we apply the cluster median algorithm. We lookup the cluster ID of the detector for which we are attempting to impute data, pull the quantity profiles for the cluster out of database, and apply the correct value for this day of week and time of day.

In very rare circumstances, we can run across a detector that can't be assigned to a cluster of a spatial neighbor. This only happens when the detector is new and there are no other detectors on the same freeway. In this case we can't assign the detector to the cluster of its spatial neighbor. We then use a default cluster that is the median of the other clusters. This is similar to the Global Median routine that we used to use in PeMS 5.0 and before.

Let me know when you have time to discuss this.

kengodleskidot commented 4 months ago

I do not think we will be exploring this type of imputation so do you see any issue with closing this topic @ian-r-rose @britt-allen @mmmiah?

britt-allen commented 3 months ago

I see no issue with closing it if that's the case!