CityofToronto / bdit_data-sources

Data sources used by the Big Data Innovation Team
https://github.com/orgs/CityofToronto/teams/bigdatainnovationteam
GNU General Public License v3.0
39 stars 8 forks source link

There are anomalies between Miovision bike approach (classification_uid = 2) and bike tmc (classification_uid = 10) volumes #798

Open gabrielwol opened 11 months ago

gabrielwol commented 11 months ago

The docs say:

Approach level bicycle counts should be used for the large majority of applications as the data is considered more accurate.

However, for most intersection-approach combinations, bike entrances (classification_uid = 10 AND movement_uid = 7), are fewer in number than bike tmc (classification_uid = 2):

"category" "approach_count" "distinct_intersection_count"
"bike_entrances < bike_tmc" 175 52
"bike_entrances >= bike_tmc" 13 10
"bike_entrances IS NULL AND bike_tmc IS NOT NULL" 30 11
"bike_tmc IS NULL" 13 9
SQL used below. Can remove the last aggregation to see individual intersection/approach combos. ```sql WITH bike_summary AS ( SELECT intersection_uid, leg AS approach, SUM(volume) FILTER (WHERE classification_uid = 2) AS bike_tmc, SUM(volume) FILTER (WHERE classification_uid = 10 AND movement_uid = 7) AS bike_entrances FROM miovision_api.volumes WHERE datetime_bin >= '2023-12-01'::date AND datetime_bin < '2023-12-12'::date GROUP BY intersection_uid, leg ), categories AS ( SELECT intersection_uid, approach, CASE WHEN bike_entrances < bike_tmc THEN 'bike_entrances < bike_tmc' WHEN bike_entrances IS NULL AND bike_tmc IS NOT NULL THEN 'bike_entrances IS NULL AND bike_tmc IS NOT NULL' WHEN bike_tmc IS NULL THEN 'bike_tmc IS NULL' WHEN bike_entrances >= bike_tmc THEN 'bike_entrances >= bike_tmc' END AS category, bike_entrances, bike_tmc FROM bike_summary ) SELECT category, COUNT(*) AS approach_count, COUNT(DISTINCT intersection_uid) AS distinct_intersection_count FROM categories GROUP BY category ```
gabrielwol commented 11 months ago

@radumas what you asked me to look in to. This seems to far exceed a level where we would want to be alerted about individual cases. Attaching the list of individual approach/intersections here in case we want to see to Miovision: bike_approach_vs_tmc_202312.csv

Nate-Wessel commented 11 months ago

I looked at this somewhat systematically a while back. Here are some (all) plots showing the differences between the approach (blue) and TMC (green) counts per intersection. In general they are very different numbers. Usually the TMCs are systematically much higher than the approaches, though by how much seems to vary by intersection.

Sometimes on rare occasions, they're well aligned!

This is why the volumes index considers both methods where/when both are available.

radumas commented 11 months ago

Thanks Nate! Meant to coordinate with you before Gabe started on this but here we are.

@gabrielwol could you change the output to include intersection names?

gabrielwol commented 11 months ago

@radumas Here's that same output, with names and sorted by intersection. Now for two weeks 2023-12-01 -- 2023-12-14: bike_approach_vs_tmc_202315.csv

gabrielwol commented 11 months ago

As requested, results now including bikes in crosswalk: miovision_bike_approach_vs_tmc_vs_crosswalk_20231220.csv

WITH bike_summary AS (
    SELECT
        intersection_uid,
        leg AS approach,
        SUM(volume) FILTER (WHERE classification_uid = 2) AS bike_tmc,
        SUM(volume) FILTER (WHERE classification_uid = 10 AND movement_uid = 7) AS bike_entrances,
        SUM(volume) FILTER (WHERE classification_uid = 7) AS bike_in_crosswalk
    FROM miovision_api.volumes
    WHERE
        datetime_bin >= '2023-12-01'::date
        AND datetime_bin < '2023-12-19'::date
    GROUP BY
        intersection_uid,
        leg
)

SELECT
    intersection_uid,
    intersection_name,
    approach,
    CASE
        WHEN bike_entrances < bike_tmc THEN 'bike_entrances < bike_tmc'
        WHEN bike_entrances IS NULL AND bike_tmc IS NOT NULL THEN 'bike_entrances IS NULL AND bike_tmc IS NOT NULL'
        WHEN bike_tmc IS NULL THEN 'bike_tmc IS NULL'
        WHEN bike_entrances >= bike_tmc THEN 'bike_entrances >= bike_tmc'
    END AS category,
    bike_entrances,
    bike_tmc,
    bike_in_crosswalk
FROM bike_summary
LEFT JOIN miovision_api.intersections USING (intersection_uid)