cagov / caldata-mdsa-caltrans-pems

CalData's MDSA project with Caltrans on Performance Measurement System (PeMS) data
https://cagov.github.io/caldata-mdsa-caltrans-pems/
MIT License
7 stars 0 forks source link

Imputation Data Holes #404

Closed jkarpen closed 2 weeks ago

jkarpen commented 1 month ago

We have used five different imputation techniques (local regression, regional regression, global regression, local average and regional average to fill the missing data from the good/bad detectors. This imputation model was build on 'int_performance__detector_metrics_agg_five_minutes'. The imputation data table should contain the correct number of rows per detector on a daily basis after filling all data holes with either observed or imputed data. To control the QC of data, lets build a model that will track daily sample size for each detectors from the data source as well as from the imputation model output. This data size difference will be zero for each detectors. Set up this model for 7 days from current date, so that we can track any data quality issues from downstream to upstream model.

jkarpen commented 1 month ago

This is dependent on #397 (clearinghouse QA issue) so marking blocked until that one is completed.

jkarpen commented 2 weeks ago

Per @mmmiah this is covered by Ken's recent PR so this is considered complete.