CityofToronto / bdit_data-sources

Data sources used by the Big Data Innovation Team
https://github.com/orgs/CityofToronto/teams/bigdatainnovationteam
GNU General Public License v3.0
38 stars 8 forks source link

Reconcile Bluetooth routes that have changed over time #105

Open radumas opened 6 years ago

radumas commented 6 years ago

Offshoot of #62

Three things to consider:

  1. When route's analysis_id changed but the geometry did not
  2. When route's analysis_id changed and the geometry did
  3. Something else

    • [x] For 1 there should be a lookup table for deprecated analysis_ids linking to the new segment_name. This should silently span those report changes for the historical Open Data release.
    • [x] bluetooth.report_active_dates has been created to show start and end dates of routes.

There's a puzzle remaining with the routes on College, where new routes were created on 2017-09-17, but the old routes were never decommissioned. I thought the routes were the same, but there seems to be subtle differences in the 5-minute aggregation (with our filtering out of WiFi observations).

WITH old_routes AS (SELECT analysis_id, datetime_bin, segment_name, tt, obs 
      FROM bluetooth.aggr_5min  
      INNER JOIN bluetooth.segments USING (analysis_id)
WHERE segment_name IN ('CO_BA_CO_UN','CO_UN_CO_PA','CO_PA_CO_UN','CO_UN_CO_BA') AND duplicate)
, new_routes AS (SELECT analysis_id, datetime_bin, segment_name, tt, obs 
      FROM bluetooth.aggr_5min  
      INNER JOIN bluetooth.segments USING (analysis_id)
WHERE segment_name IN ('CO_BA_CO_UN','CO_UN_CO_PA','CO_PA_CO_UN','CO_UN_CO_BA') AND NOT duplicate)

SELECT Date_trunc('month', datetime_bin)::DATE AS mnth, segment_name, COUNT(1)
FROM old_routes
INNER JOIN new_routes USING (segment_name, datetime_bin)
WHERE new_routes.obs != old_routes.obs OR new_routes.tt != old_routes.tt
GROUP BY mnth, segment_name
ORDER BY mnth

The results of the above query, not aggregated by month are below

segment_name count
CO_BA_CO_UN 5835
CO_PA_CO_UN 1486
CO_UN_CO_BA 6255
CO_UN_CO_PA 1421

I thought this might be due to some changing in processing over time (see #103, or since move_raw_data() was created), but the differences persist until today.

radumas commented 6 years ago

I'm going to remove observations for the old duplicate College routes from bluetooth.observations and park them in a duplicate_route_observations table and then delete the aggregates based on them.


SELECT * 
INTO bluetooth.duplicate_route_observations
FROM bluetooth.observations 
INNER JOIN bluetooth.segments USING (analysis_id)
WHERE duplicate AND measured_timestamp >= '2017-09-14';
DELETE FROM bluetooth.observations 
USING bluetooth.segments 
WHERE segments.analysis_id = observations.analysis_id AND duplicate AND measured_timestamp >= '2017-09-14';

DELETE FROM bluetooth.aggr_5min  aggr
USING bluetooth.segments 
WHERE segments.analysis_id = aggr.analysis_id AND duplicate AND datetime_bin >= '2017-09-14';

DELETE FROM bluetooth.aggr_5min_alldevices   aggr
USING bluetooth.segments 
WHERE segments.analysis_id = aggr.analysis_id AND duplicate AND datetime_bin >= '2017-09-14';

DELETE FROM bluetooth.aggr_15min aggr
USING bluetooth.segments 
WHERE segments.analysis_id = aggr.analysis_id AND duplicate AND datetime_bin >= '2017-09-14';

DELETE FROM bluetooth.aggr_15min_alldevices aggr
USING bluetooth.segments 
WHERE segments.analysis_id = aggr.analysis_id AND duplicate AND datetime_bin >= '2017-09-14';

DELETE FROM bluetooth.aggr_15min_temp aggr
USING bluetooth.segments 
WHERE segments.analysis_id = aggr.analysis_id AND duplicate AND datetime_bin >= '2017-09-14';

UPDATE bluetooth.all_analyses a
   SET pull_data=FALSE
FROM bluetooth.segments s
 WHERE duplicate AND a.analysis_id = s.analysis_id;
radumas commented 6 years ago

Dundas routes

The western end of Dundas is surprisingly confusing. My understanding based on the below table, which you can produce with the query at the bottom of this comment. The westernmost detector was at Roncesvalles, installed 2015-12-03. At the same time a detector was installed, from West to East at Sterling, Dovercourt, and Spadina. On 2017-09-19 detectors were installed at Dufferin (between Sterling and Dovercourt) and Bathurst (between Dovercourt and Spadina) On 2017-12-17 the detector at Roncesvalles was removed. This has left this strange situation where Sterling-Dovercourt and Dovercourt-Spadina overlap with Dufferin-Bathurst and Bathurst-Spadina (see map below table).

analysis_id direction start_crossstreet end_crossstreet report_name start_date end_date
1412879 EB Dovercourt Spadina DT- Dundas - AD3 to AD4 2015-12-03
1453262 EB Dufferin Bathurst DT-0006. Dundas-EB_Dufferin-to-Bathurst 2017-09-19
1453239 EB Roncesvalles Dufferin DT-0005. Dundas-EB_Roncesvalles-to-Dufferin 2017-10-05 2017-12-17
1412641 EB Roncesvalles Sterling DT- Dundas - AD1 to AD2 2015-12-03 2017-12-17
1412818 EB Sterling Dovercourt DT- Dundas - AD2 to AD3 2015-12-03

image

SELECT  segment_name, analysis_id, direction, start_crossstreet, 
       end_crossstreet, a.report_name, start_date, end_date, length, bluetooth, wifi
FROM bluetooth.segments_dir_corrected segs
INNER JOIN bluetooth.all_analyses a USING (analysis_id)
JOIN bluetooth.report_active_dates USING(analysis_id)
WHERE street = 'Dundas'
ORDER BY direction, start_crossstreet
radumas commented 6 years ago

The below query identifies the analyses for which we've stopped receiving data and when.

SELECT a.analysis_id, a.report_name, segment_name,
       start_date, end_date
  FROM bluetooth.all_analyses a
LEFT OUTER JOIN bluetooth.segments seg USING(analysis_id )
JOIN bluetooth.report_active_dates USING(analysis_id)
  WHERE pull_data
  ORDER BY end_date, report_name, start_date
analysis_id report_name segment_name start_date end_date
1419415 GTA - Lawrence AL1-AL2 EB 2016-04-09 2017-04-19
1419433 GTA - Lawrence AL2-AL1 WB 2016-04-09 2017-04-19
1419444 GTA - Sheppard AS1-AS2 EB 2016-04-09 2017-07-05
1419450 GTA - Sheppard AS2-AS1 WB 2016-04-09 2017-07-06
1428103 Shep Heron Hill to Pharmacy 2017-04-19 2017-07-06
1428150 Shep Pharmacy to 401 2017-04-19 2017-07-06
1428139 Shep Pharmacy to Clydesdale 2017-04-19 2017-07-06
1428120 Shep Pharmacy to Heron Hill 2017-04-19 2017-07-06
1427570 VP 401 to Pharmacy 2017-04-19 2017-07-06
1428092 VP Clydesdale to Pharmacy 2017-04-19 2017-07-06
1432987 DTB - Adelaide to Richmond at Duncan and Simcoe 2017-04-19 2017-07-21
1432982 DTB - Richmond -BR2-BR3 RM_SB_RM_DN 2016-10-01 2017-07-21
1432978 DTB - Richmond -BR3-BR4 RM_DC_RM_BA 2016-10-01 2017-07-21
1432992 DTB - Richmond to Adelaide at Duncan and Simcoe 2017-04-19 2017-07-21
1453239 DT-0005. Dundas-EB_Roncesvalles-to-Dufferin DU_RO_DU_DF 2017-10-05 2017-12-17
1453507 DT-0018. Dundas-WB_Dufferin-to-Roncesvalles DU_DF_DU_RO 2017-10-05 2017-12-17
1454832 DT-0083. Roncesvalles-SB_Dundas-to-Queen DU_RO_QU_RO 2017-09-21 2017-12-17
1454853 DT-0084. Roncesvalles-NB_Queen-to-Dundas QU_RO_DU_RO 2017-09-21 2017-12-17
1412641 DT- Dundas - AD1 to AD2 DU_RO_DU_SL 2015-12-03 2017-12-17
1412684 DT- Dundas - AD2 to AD1 DU_SL_DU_RO 2015-12-03 2017-12-17
1448452 Miln Milner to 401Morn 2017-08-16 2017-12-27
1448469 Morn 401Morn to McLevin 2017-08-15 2017-12-27
1448474 Morn 401Morn to Meadowvale 2017-08-15 2017-12-27
1448479 Morn 401Morn to Milner 2017-08-16 2017-12-27
1448484 Morn 401Morn to Morningside 2017-08-15 2017-12-27
1448489 Morn 401Morn to Neilson 2017-08-15 2017-12-27
1448494 Morn 401Morn to Sheppard 2017-08-15 2017-12-27
1448499 Morn McLevin to 401Morn 2017-08-15 2017-12-27
1448531 Morn Morningside to 401Morn 2017-08-15 2017-12-27
1448567 Morn Sheppard to 401Morn 2017-10-30 2017-12-27
1448665 SE Meadowvale to 401Morn 2017-08-16 2017-12-27
1448715 SE Neilson to 401Morn 2017-08-16 2017-12-27
1453667 DT-0027. Queen-EB_Parliament-to-Broadview QU_PA_QU_BV 2017-10-02 2018-02-15
1453680 DT-0028. Queen-WB_Broadview-to-Parliament QU_BV_QU_PA 2017-10-02 2018-02-15
1454293 DT-0057. King-EB_Parliament-to-Broadview KN_PA_KN_BV 2017-10-02 2018-02-15
1454311 DT-0058. King-WB_Broadview-to-Parliament KN_BV_KN_PA 2017-10-02 2018-02-15
1455710 DT-0139. Broadview-SB_Queen-to-Eastern QU_BV_EA_BV 2017-10-02 2018-02-15
1455724 DT-0140. Broadview-NB_Eastern-to-Queen EA_BV_QU_BV 2017-10-02 2018-02-15
radumas commented 5 years ago

Removing observations from the Sherbourne-Parliament segments of Richmond and Adelaide because they are wholly contained by the newer King Street Pilot Jarvis-Parliament segments, which are also longer.

I'm leaving the other Richmond and Adelaide routes alone because, while they overlap in time, they are very staggered in space.

WITH ,insertion as (INSERT INTO bluetooth.duplicate_route_observations(
    analysis_id, id, user_id, measured_time, measured_time_no_filter, startpoint_number, startpoint_name, endpoint_number, endpoint_name, measured_timestamp, outlier_level, cod, device_class)
    SELECT analysis_id, id, user_id, measured_time, measured_time_no_filter, startpoint_number, startpoint_name, endpoint_number, endpoint_name, measured_timestamp, outlier_level, cod, device_class 
    FROM bluetooth.observations
    WHERE analysis_id IN (1427970, 1427959) AND measured_timestamp >= '2017-10-02'
    RETURNING id)
    DELETE FROM bluetooth.observations
    USING insertion
    WHERE observations.id = insertion.id;
    DELETE FROM bluetooth.observations
    USING insertion
    WHERE observations.id = insertion.id;
rickl4 commented 5 years ago

The reader at College/Bathurst was last operational on October 10th, however it's still marked as operational among multiple sources. We should fix this in our map, spreadsheet and database tables, and find out if other readers, marked as operational, aren't.