CityofToronto / bdit_data-sources

Data sources used by the Big Data Innovation Team
https://github.com/orgs/CityofToronto/teams/bigdatainnovationteam
GNU General Public License v3.0
38 stars 8 forks source link

Need to monitor intersection_movements for missing high volume movements #855

Open gabrielwol opened 7 months ago

gabrielwol commented 7 months ago

None of the new intersections have volumes for classification_uid 10 (bike approaches). We'll need to confirm whether this is an algorithm change or a configuration issue.

4 of the new intersection are missing any volumes for classification_uid 6 (pedestrians): intersection_uids IN (69, 71, 72, 90)

_Originally posted by @gabrielwol in https://github.com/CityofToronto/bdit_data-sources/issues/850#issuecomment-1919838271_

Because there was no data for these classifications when we added the intersections, the movements didn't get added to miovision_api.intersection_movements (common intersection movements which get zero-padded in aggregate volumes_15min_mvt table). If these classifications eventually get added, we will need to go back and add the movements and backfill the aggregate tables.

gabrielwol commented 6 months ago

This issue also applies more generally speaking; with long running intersection installations, we risk not aggregating movements to volumes_15min_mvt + volumes_15min because they were not common at the time of configuration. The example above, where an entire classification is not present upon configuration is a good one. We have a warning about "# of invalid movements" but it fails silently.

Broadly speaking, the fraction of volumes in v15_mvt has dropped to a low of about 98% over time, pointing to a need to more closely monitor intersection_movements: image

gabrielwol commented 6 months ago

Here's some sql I used to find common movements missing from intersection_movements. Top of the list are a few invalid auto movements and a lot of valid ped movements with > 100 volume per day.

--Total query runtime: 2 min. 921 rows affected.
SELECT
    intersection_uid, classification_uid, leg, movement_uid, SUM(volume), MIN(datetime_bin), MAX(datetime_bin),
    SUM(volume) / EXTRACT(epoch FROM MAX(datetime_bin) - MIN(datetime_bin)) * 86400 AS avg_daily
FROM miovision_api.volumes
WHERE
    volume_15min_mvt_uid IS NULL --unprocessed
    AND movement_uid <> 8 --we intentionally exlcude bike exits from aggregations
GROUP BY intersection_uid, classification_uid, leg, movement_uid
HAVING
    --prevent divide by zero
    MAX(datetime_bin) > MIN(datetime_bin)
    --remove smaller issues
    AND SUM(volume) > 1000
ORDER BY avg_daily DESC
gabrielwol commented 5 months ago

Related: we should improve the documentation around the limitations of volumes_15min_mvt - that it excludes certain movements based on intersection_movements table.