CityofToronto / bdit_congestion

Determining most congested streets and days in Toronto
https://github.com/orgs/CityofToronto/teams/bigdatainnovationteam
GNU General Public License v3.0
0 stars 0 forks source link

Retired segments in 22_2 link lookup table #76

Closed chmnata closed 4 months ago

chmnata commented 4 months ago

https://github.com/CityofToronto/bdit_tt_request_app/issues/127 @Nate-Wessel found duplicated link_dirs in the congestion.network_links_22_2

segment_id: 1344 image segment_id: 7076 image

chmnata commented 4 months ago

Turns out 1344 is an outdated segment from 21_1, not included in network_segments

select * from congestion.network_segments
where  segment_id IN (1344,7076)

There is also a record of it in network_segments_retired , where it has been replaced by {7074,7076}, valid from 2017-09-01 to 2022-08-27

select * from congestion.network_segments_retired
where segment_id = 1344
chmnata commented 4 months ago

Checked that 7074 and 7076 does cover 1344 image

chmnata commented 4 months ago

Delete outdated segment in congestion.network_links_22_2

delete from congestion.network_links_22_2
where segment_id = 1344

As well as records in congestion.network_segments_daily There is actually no records in the daily table, likely because the query calls a join to network_segments table, which doesn't have the outdated segment_id.

select * from congestion.network_segments_daily
where segment_id = 1344
chmnata commented 4 months ago

@Nate-Wessel has discovered more duplicated link_dirs

SELECT
    link_dir,
    ARRAY_AGG(DISTINCT segment_id) AS segments
FROM congestion.network_links_22_2
GROUP BY link_dir
HAVING COUNT(DISTINCT segment_id) > 1

There are 68 segments with dup links

chmnata commented 4 months ago

Some of them are the same case as above

with temp as (SELECT
    link_dir,
    ARRAY_AGG(DISTINCT segment_id) AS segments
FROM congestion.network_links_22_2
GROUP BY link_dir
HAVING COUNT(DISTINCT segment_id) > 1)
, temps as (
    select distinct unnest(segments) segment_id
from temp)
select distinct segment_id 
from temps
inner join congestion.network_segments_retired using (segment_id)
inner join congestion.network_segments using (segment_id)

Delete them:


delete from congestion.network_links_22_2
where segment_id in 
(1096,1200,1294,1367,1865,2044,2085,2297,2299,2701,2915,3237,4225,
4236,4338,4369,4371,4491,4492,5201,5246,5872,6062,6152)
chmnata commented 4 months ago

Returns 0 now

SELECT
    link_dir,
    ARRAY_AGG(DISTINCT segment_id) AS segments
FROM congestion.network_links_22_2
GROUP BY link_dir
HAVING COUNT(DISTINCT segment_id) > 1