cagov / caldata-mdsa-caltrans-pems

CalData's MDSA project with Caltrans on Performance Measurement System (PeMS) data
https://cagov.github.io/caldata-mdsa-caltrans-pems/
MIT License
5 stars 0 forks source link

Create imputation method for local and regional median/average #272

Closed ian-r-rose closed 1 week ago

ian-r-rose commented 2 months ago

The local, regional, and global regression imputation schemes all rely on the current detector having been active during one of our regression periods. This is not always true for some stations that have been unreliable for a long time.

One way to further fill in these gaps could be to just use averages or medians of values from local/regional stations. This would be much closer to a "nearest-neighbor" type of imputation scheme. A nice feature of this is that it would be pretty straightforward to implement: we are already collecting information from station neighbors and aggregating over it in the regression methods, and adding a new aggregation is not too difficult.

Now, we probably wouldn't consider this method to be super accurate, and downstream metrics could decide whether it's actually a good idea to include this method, but I think this could be a decent way to implement something similar #172 or #171 in a way that we understand and can defend.

Thoughts @kengodleskidot?

kengodleskidot commented 2 months ago

@ian-r-rose I think this would be a solid alternative to using #172 and #171 and could be used as our fallback alternative for imputed data after all other methods (local, regional, global) fail. For this method we could replace route/station type from the selection criteria (which I believe is used in the local/regional model) and use station type/number of lanes instead within a reasonable buffer and grabbing the average or median values for imputed values. It would be the least accurate imputation method but would be a catch all for any remaining stations that do not have sufficient observed data to utilize the other imputation methods. It would also maintain consistency with our other imputation methodologies and be easier to understand.

jkarpen commented 3 weeks ago

@ian-r-rose and @mmmiah will meet for knowledge transfer on this before Ian goes out.

jkarpen commented 3 weeks ago

@mmmiah did you and Ian have a chance to meet and discuss this one? Will you be taking this over?

mmmiah commented 3 weeks ago

@jkarpen, we did not get chance to meet, I will look into this and will start working on it from where Ian left. Thank you for checking! Appreciate!