cagov / caldata-mdsa-caltrans-pems

CalData's MDSA project with Caltrans on Performance Measurement System (PeMS) data
https://cagov.github.io/caldata-mdsa-caltrans-pems/
MIT License
2 stars 0 forks source link

Imputation: meta issue #173

Open ian-r-rose opened 2 months ago

ian-r-rose commented 2 months ago

The goal is to roughly reproduce the imputation schemes described in the PeMS User Guide. The regression-based scheme is described in more detail in this research paper.

kengodleskidot commented 2 months ago

Thanks for starting this discussion and I have a few thoughts. I think that the route type should be considered when looking at imputation logic. As an example, the imputation values for an on-ramp should probably not be influenced by freeway-freeway connection or HOV mainline lane values. I have included a list of detector and station types in PeMS for your reference.

I also like the idea of using associated values on the same or similar routes such as route type, direction, # of lanes, flows, time of day, day of week, holidays, etc. to determine imputation values. Ultimately, I think data points such as weather, lane closures, CHP (maybe) and construction should also influence imputation values but I am not sure we have time to explore all these different parameters. PeMSType.csv

ian-r-rose commented 2 months ago

Yes, agree 100% that we should limit regressions to detectors of the same type so that we don't look for correlations between e.g. ramps and mainline.

kengodleskidot commented 2 months ago

Just to clarify I think the additional parameters would only be needed if detector data in close proximity is not available. The additional parameters would only be needed if there is no nearby data to correlate with 😊

ian-r-rose commented 2 months ago

Agreed!

kengodleskidot commented 2 months ago

@ian-r-rose FYI list of station types in PeMS:

CD Coll/Dist FF Fwy-Fwy HV HOV FR Off Ramp OR On Ramp ML Mainline CH Conventional Highway

kengodleskidot commented 1 month ago

I've had a chance to review the imputation paper and associated imputation tables in the PeMS data warehouse. I would like to have a meeting with the group to go over the assumptions in the paper and review the imputation tables in PeMS. Based on the paper and data tables we should be creating a model that stores flow, occupancy and speed coefficients associated with a station and it's neighbor independent of time, see screenshot below:
image I believe these coefficients are applied to a formula that incorporates actual values reported by neighbors for a five-minute period which allows for the exclusion of time for the coefficients. When reviewing the paper, the version I have is missing many of the symbols so hopefully there is a cleaner copy out there we can use. I believe the imputation coefficients were updated on a yearly basis from 2012-2017 but has not been updated since. I will follow this up with an email but wanted to document it here future reference.