Open ian-r-rose opened 4 months ago
Below are my thoughts:
Agree that we should be consistent and using volume
over flow
makes sense to me
I believe we can drop the aggregation type in the name, but I believe there is value in including if the value is observed vs. imputed values/normalized values and the method of imputation. There are a variety of use cases where users want to see the difference between observed (non-imputed), imputed and normalized values. Below are some screenshots in the current PeMS so you can see some associated reports:
This is a QA/QC step that we should validate. For imputed speed I do not believe we are aggregating at the 5-minute detector level but higher-level aggregations (hourly, daily, etc.) and the station level should be confirmed.
I see the convenience of using the same name for occupancy
, volume
, and speed
across multiple models but there is the potential of misusing these values based on the level of aggregation. This is primarily a concern for me on the reporting side and ensuring the correct value is being used for the level of aggregation a report is displaying (e.g. not using the 5-minute speed in an hourly aggregated report). If there are any best practices on how to minimize potential misuse of values with the same name but different aggregations that would be helpful.
@ian-r-rose @mmmiah Please add your thoughts when you have the chance.
We are currently not very consistent about column names for volume, occupancy, and especially speed. A few things I see in the current project:
flow
and in othersoccupancy
. We've mostly standardized onoccupancy
, but we should validate that we are doing that consistentlyoccupancy
, and in othersoccupancy_avg
.speed
, in others we are usingspeed_five_mins
, and in others we are usingspeed_weighted
volume
and in others we are usingvolume_sum
weekly_volume
), in others we are not.Proposal
I propose the following conventions:
Basically, the above amounts to always using the simplest names
occupancy
,volume
, andspeed
, rather than trying to encode more information about the aggregations in the column names.Thoughts?