Closed mkewins closed 3 years ago
Haven't looked into it yet, but some quick thoughts.
One very likely culprit is our conflate_artery_centreline
process in Airflow. Each FLOW study is attached to an "arterycode", which describes the location / direction of the study. To power the centreline-focused interactions in MOVE, we attempt to tie each such arterycode to an intersection or midblock in the centreline. This process isn't 100% accurate, however; an error there would mean that the arterycode is tied to the wrong centreline location, which would result in the behaviour described here.
We do have a set of manual overrides borrowed from previous work by @aharpalaniTO and others, but that's not likely to be exhaustive either.
Other than that, I'm not sure what else would cause this; when I get back to this (probably after more of the multi-location stuff), I'll check to see if it's a conflate_artery_centreline
issue first. If it is, I'll probably add it to our manual overrides set for now. If not, I'll have to dig deeper.
I also don't know how we would assess consistency across FLOW and MOVE more systemically: other than one-off bug tickets, I don't have a good way right now to determine the whole set of reports available in FLOW, let alone compare that (or port it over) to MOVE. Definitely welcome any thoughts from @aharpalaniTO @radumas @jwrcoleman and others who have worked with these datasets / applications longer than I have!
Adrian Hetherington reported another intersection where the data is in Flow, but is missing from MOVE. Intersection is Wilmington Ave and Finch Ave West.
I can't reproduce it though -- I'm seeing studies from the same date in both Flow and MOVE. What I am seeing is different count totals.
@mkewins - the Wilmington and Finch count total mismatch might have been solved by #666 , which fixed a general bug with TMC reports at T-intersections.
The other parts of this look like a potential conflation issue with intersections. There's quite a large number of "psuedo-intersections" around Yorkdale Rd, mostly where the road meets various expressway ramps:
select count(*), elevatio9, elevatio10 from gis.centreline_intersection where intersec5 like '%Yorkdale Rd%' group by elevatio9, elevatio10;
count | elevatio9 | elevatio10
-------+-----------+------------
9 | 501100 | Expressway
7 | 501300 | Minor
1 | 504000 | Pedatraian
1 | 509100 | Cul de sac
36 | 509200 | Pseudo
We exclude all intersections where elevatio9 > 501700
from both centreline vector tiles and location search, mostly to limit visual / search clutter from trailheads and railway crossings and such.
However, we do include all non-trail intersections when conflating counts against the centreline. Even with that, though, we still fail to conflate ~300 intersections:
select count(*) from "TRAFFIC"."ARTERYDATA" ad join counts.arteries_intersection ai on ad."ARTERYCODE" = ai.arterycode wh
ere ai.int_id is null;
count
-------
287
(1 row)
This unfortunately includes PX 1476:
select * from "TRAFFIC"."ARTERYDATA" ad join counts.arteries_intersection ai on ad."ARTERYCODE" = ai.arterycode where ai.
int_id is null and ad."LOCATION" like '%YORKDALE RD%';
ARTERYCODE | GEOMCODE | STREET1 | STREET1TYPE | STREET1DIR | STREET2 | STREET2TYPE | STREET2DIR | STREET3 | STREET3TYPE | STREET3DIR | STAT_CODE | COUNT_TYPE | LOCATION | APPRDIR | SIDEOFINT | LINKID | SEQ_ORDER | GEO_ID | arterycode | match_on_case | int_id | geom
------------+----------+----------+-------------+------------+------------------------+-------------+------------+--------------------------------+-------------+------------+------------+------------+-------------------------------------------------------------------+---------+-----------+----------+-----------+--------+------------+---------------+--------+------
4960 | 0 | YORKDALE | RD | | YORKDALE N PARKING LOT | | | PX 1476 335M E OF DUFFERIN ST | | | 0015000025 | R | YORKDALE RD AT YORKDALE N PRKG LOT & 335M E OF DUFFERIN (PX 1476) | | | 15000025 | | | 4960 | 3 | |
(1 row)
This is a complicated one: from the point of view of the centreline, it's on a midblock, because there's no City road to intersect with. From the counts / studies point of view, it's at an intersection where one of the roads happens to be a private parking lot entrance.
Part of the complexity here lies in how location is represented in FLOW data:
(X, Y)
coordinates using an old NAD projection that doesn't line up completely with modern projections.This isn't great, and we should very much try to move away from it when we tackle data pipelines. (For instance: if these counts were tagged directly with centreline IDs plus lat-long location, that would be great! Even reliable lat-long location alone would help a great deal.)
Going to suggest the following course of action:
Not sure what the priority on any of this should be, though - might make more sense to dive in during data pipeline phases of the project?
One thought would just be to make sure that all signalized intersections are included in the conflation, otherwise defer to data pipeline phase (with suitable warnings applied etc.). I think you've hit on what will be a common problem, where not all traffic signals are at centreline_intersection
locations, specifically signals at private driveways or parking lots. See below (white circles are intersections, red circles are traffic signals):
Looking at the traffic_signal
layer, there are 206 records where midblock_route is not Null
(our traffic_signal
layer has not been updated since October 2019 so this might miss a couple new ones)
Another case reported from longitudinal study feedback:
From the bug report and a bit of digging, these studies should have centreline_id = 30026352
, but are instead conflated to centreline_id = 13465353
.
Looking deeper, this is the result of conflation for arterycode 11309:
flashcrow=> select * from counts.arteries_intersection where arterycode=11309;
arterycode | match_on_case | int_id | geom
------------+---------------+----------+----------------------------------------------------
11309 | 2 | 13465353 | 0101000020E610000057E12CAB04D753C0E8D43A1640D44540
However, further exploration reveals a conflicting picture: centreline_id = 13465353
is a better spatial match for this arterycode, but centreline_id = 30026352
matches up better with the FILE_NAME
that describes this as being at Sumach and Shuter! (Welcome to the fun, fun world of legacy data.)
Because of this, it's unclear how to proceed. Do we do nothing? Do we enter a manual override, which also risks mislabelling this arterycode? How do we make decisions like this in general, and how do we explain those decisions to our users?
We first query to get at the arterycode:
flashcrow=> select * from "TRAFFIC"."COUNTINFOMICS" where "COUNT_INFO_ID" in (select count_group_id from counts.studies where centreline_id = 13465353);
COUNT_INFO_ID | ARTERYCODE | COUNT_TYPE | COUNT_DATE | DAY_NO | COMMENT_ | FILE_NAME | SOURCE1 | SOURCE2 | LOAD_DATE | TRANSFER_REC | CATEGORY_ID
---------------+------------+------------+---------------------+--------+----------+----------------------------------------+---------+---------+---------------------+--------------+-------------
32659 | 11309 | R | 2015-06-22 00:00:00 | 2 | | Crunch..EF06.23 & EF06.23 & EF07.07.15 | | | | | 5
36777 | 11309 | R | 2018-02-27 00:00:00 | 3 | | 88186_SUMACH_ST_and_SHUTER_ST_BIKE.hex | | | | | 5
8463 | 11309 | R | 2000-05-30 00:00:00 | 2 | | YY05.06 | | | 2000-07-12 08:40:53 | -1 | 5
6496 | 11309 | R | 1997-12-01 07:45:00 | 1 | | | | | | 0 | 5
match_on_case = 2
here means this is matching using counts.arteries_intersection_int_id
, which in turn uses "TRAFFIC"."NODE"
:
flashcrow=> select * from counts.nodes_intersection where link_id = (select "LINKID"::bigint from "TRAFFIC"."ARTERYDATA" where "ARTER
YCODE" = 11309);
link_id | match_on_case | int_id | geom
----------+---------------+----------+----------------------------------------------------
13465353 | 1 | 13465353 | 0101000020E610000057E12CAB04D753C0E8D43A1640D44540
match_on_case = 1
here means it's matched the link_id
directly to an intersection ID (which makes sense, they're the same value.)
However, even if we did perform a spatial match instead of matching link_id
against the centreline, results here suggest that int_id = 13465353
is in fact a much better match:
SELECT nc.link_id, 2 AS match_on_case, ci.int_id, ci.geom, ci.geom_dist
FROM counts.nodes_corrected nc
JOIN LATERAL (
SELECT
int_id,
geom,
ST_Distance(ST_Transform(nc.geom, 2952), ST_Transform(geom, 2952)) AS geom_dist
FROM counts.centreline_intersection
WHERE ST_DWithin(ST_Transform(nc.geom, 2952), ST_Transform(geom, 2952), 30)
ORDER BY geom_dist ASC
) ci ON true
WHERE nc.link_id = 13465353;
link_id | match_on_case | int_id | geom | geom_dist
----------+---------------+----------+----------------------------------------------------+-------------------
13465353 | 2 | 13465353 | 0101000020E610000057E12CAB04D753C0E8D43A1640D44540 | 0.120069462801905
13465353 | 2 | 30026352 | 0101000020E61000006F07342202D753C0B1AD253843D44540 | 16.5067114852413
Another "we need midblock TMCs" report, near Morningside and Beath:
Could not find TMC at Morningside and West Hill Collegiate Access, but available on Flow
Centreline-wise this is a midblock, but in reality there's an intersection there with a private driveway to West Hill Collegiate Institute.
Another "we need midblock TMCs" report, near Bayview and Blythwood:
I just noticed that the attached tmc at the driveway to 1929 Bayview is in Flow but not in Move.
A report of a TMC that was conflated to the wrong intersection:
A turning movement count (TMC) for ""VAUGHAN RD AT MOIR AVE AND WINONA DR (PX 802)"" shows up in FLOW, the date is 2018-02-15.
The TMC report doesn't show up for that location in MOVE. In fact, by looking at the studies for the surrounding midblocks / intersections, it appears that the TMC has been assigned to the intersection directly to the Northwest: https://move.intra.qa-toronto.ca/view/location/s1:AMQsmB/POINTS/reports/TMC
Searching for px:802
jumps to the expected location, where (as mentioned) there are no TMCs. This looks like a mis-conflation of the underlying arterycode:
flashcrow=> select * from "TRAFFIC"."COUNTINFOMICS" where "COUNT_INFO_ID" = 36723;
COUNT_INFO_ID | ARTERYCODE | COUNT_TYPE | COUNT_DATE | DAY_NO | COMMENT_ | FILE_NAME | SOURCE1 | SOURCE2 | LOAD_DATE | TRANSFER_REC | CATEGORY_ID
---------------+------------+------------+---------------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------+---------+---------+-----------+--------------+-------------
36723 | 5865 | R | 2018-02-15 00:00:00 | 5 | NBL: Moir Ave to WB Vaughan Rd, NBT: Moir Ave to EB Vaughan Rd, NBR: Moir Ave to SB Winina Dr, SOTH: Moir Ave to NB Winona Dr, EOTH: WB Vaughan Rd to Moir Ave, WOTH: EB Vaughan Rd to Moir Ave | 88205_VAUGHAN_RD_and_WINONA_DR_BIKE.hex | | | | | 5
(1 row)
flashcrow=> select * from "TRAFFIC"."ARTERYDATA" where "ARTERYCODE" = 5865;
ARTERYCODE | GEOMCODE | STREET1 | STREET1TYPE | STREET1DIR | STREET2 | STREET2TYPE | STREET2DIR | STREET3 | STREET3TYPE | STREET3DIR | STAT_CODE | COUNT_TYPE | LOCATION | APPRDIR | SIDEOFINT | LINKID | SEQ_ORDER | GEO_ID
------------+----------+---------+-------------+------------+---------+-------------+------------+---------+-------------+------------+------------+------------+-----------------------------------------------+---------+-----------+----------+-----------+--------
5865 | 0 | VAUGHAN | RD | | WINONA | DR | | PX 802 | | | 0013460056 | R | VAUGHAN RD AT MOIR AVE AND WINONA DR (PX 802) | | | 13460056 | |
(1 row)
flashcrow=> select * from counts.arteries_centreline where arterycode = 5865;
arterycode | centreline_type | centreline_id | direction | geom
------------+-----------------+---------------+-----------+----------------------------------------------------
5865 | 2 | 13459974 | | 0101000020E6100000B34523B6F3DB53C04C9994B55FD84540
(1 row)
flashcrow=> select * from gis.centreline_intersection where int_id = 13459974;
gid | int_id | elev_id | intersec5 | classifi6 | classifi7 | num_elev | elevatio9 | elevatio10 | elev_level | elevation | elevatio13 | height_r14 | height_r15 | x | y | longitude | latitude | objectid | geom
---------+----------+---------+--------------------------+-----------+--------------------+----------+-----------+------------+------------+-----------+------------+------------+------------+-----------+-------------+---------------+--------------+----------+----------------------------------------------------
2380937 | 13459974 | 2400 | Vaughan Rd / Jesmond Ave | MNRSL | Minor-Single Level | 1 | 501300 | Minor | 0 | 0 | | 0 | | 309883.05 | 4838721.236 | -79.436749968 | 43.690420816 | 17452 | 0101000020E6100000B34523B6F3DB53C04C9994B55FD84540
(1 row)
This conflation has match_on_case = 1
, corresponding to the counts.arteries_intersection_manual_corr
view. Looking deeper into the view query results, the correct conflation is in fact a possible value:
arterycode | int_id | ranking
------------+----------+----------
5865 | 13459974 | 13459974
5865 | 13460056 | 13460056
However, ranking is just using int_id
, which results in a preference for the incorrect conflation here. We could use the LOCATION
field of "TRAFFIC"."ARTERYDATA"
, as this contains PX 802
(and could be used to match).
The above push seems to address the cases mentioned here - closing for now, and we can re-open one-off tasks as needed.
What Happened A couple folks at Traffic Signal Ops reported that certain traffic count reports are available in MOVE, but not Flow, and vice versa.
What Should Happen Reports should be consistently available between MOVE and Flow.
To Reproduce Steps to reproduce the bug:
Screenshots TCS 1346
TCS 1476
Additional Notes The TSO folks found two cases:
Unsure if these two cases are caused by the same underlying problem or something different.