cagov / caldata-mdsa-caltrans-pems

CalData's MDSA project with Caltrans on Performance Measurement System (PeMS) data
https://cagov.github.io/caldata-mdsa-caltrans-pems/
MIT License
7 stars 0 forks source link

Update Bottleneck Models with Integrated Delay Metrics #409

Closed thehanggit closed 1 month ago

thehanggit commented 1 month ago

Updates were made to the bottleneck models by integrating delay into the final table. The current model can calculate all metrics displayed in the current PeMS.

thehanggit commented 1 month ago

Hi @JamesSLogan @ian-r-rose, could you please help to check the code efficiency when considering special routes? In subqueries calcs, congestion, congestion_events, congestion_length, and agg_spatial_delay, they generate columns by sorting the absolute_postmile, in which we need to use case condition to differentiate the normal routes and special routes mentioned by @kengodleskidot. Wondering if we can simplify the condition code?

JamesSLogan commented 1 month ago

Hi @JamesSLogan @ian-r-rose, could you please help to check the code efficiency when considering special routes? In subqueries calcs, congestion, congestion_events, congestion_length, and agg_spatial_delay, they generate columns by sorting the absolute_postmile, in which we need to use case condition to differentiate the normal routes and special routes mentioned by @kengodleskidot. Wondering if we can simplify the condition code?

I don't foresee any performance issues with the conditions as written! When building the model, the bulk of time spent is in the window functions themselves, not checking the CASE branches. For the sake of maintainability, you could save the condition in the first CTE:

with station_five_minute as (
  select
    freeway,
    freeway in {{ var("special_routes") }} as is_special_route, -- maybe "is_backwards_route"?
    direction,
)
, cte as (
    case when is_special_route ...
)  
...
...
...
select * exclude is_special_route from bottleneck_delay
thehanggit commented 1 month ago

Hi @JamesSLogan @ian-r-rose, could you please help to check the code efficiency when considering special routes? In subqueries calcs, congestion, congestion_events, congestion_length, and agg_spatial_delay, they generate columns by sorting the absolute_postmile, in which we need to use case condition to differentiate the normal routes and special routes mentioned by @kengodleskidot. Wondering if we can simplify the condition code?

I don't foresee any performance issues with the conditions as written! When building the model, the bulk of time spent is in the window functions themselves, not checking the CASE branches. For the sake of maintainability, you could save the condition in the first CTE:

with station_five_minute as (
  select
    freeway,
    freeway in {{ var("special_routes") }} as is_special_route, -- maybe "is_backwards_route"?
    direction,
)
, cte as (
    case when is_special_route ...
)  
...
...
...
select * exclude is_special_route from bottleneck_delay

@JamesSLogan Thank you James! I have made these changes and built up the hourly, daily, monthly model. @kengodleskidot Please take a look.

thehanggit commented 1 month ago

Good morning @thehanggit and I had some minor comments about adding some additional metadata (district, county, state postmile) but other than that it looks good. Let me know once the PR is merged so I can spot check and perform some QC.

Thanks Ken! The source table (station metrics 5 minutes aggregation) doesn't include CA postmile information but only absolute postmile. Do you want me to only include district and county?

kengodleskidot commented 1 month ago

Good morning @thehanggit and I had some minor comments about adding some additional metadata (district, county, state postmile) but other than that it looks good. Let me know once the PR is merged so I can spot check and perform some QC.

Thanks Ken! The source table (station metrics 5 minutes aggregation) doesn't include CA postmile information but only absolute postmile. Do you want me to only include district and county?

Yes, district and county should be added if they are available. We can discuss with the group about the best way of adding the state postmile column.