CityofToronto / bdit_data-sources

Data sources used by the Big Data Innovation Team
https://github.com/orgs/CityofToronto/teams/bigdatainnovationteam
GNU General Public License v3.0
38 stars 8 forks source link

Miovision Bike Approach Volumes #403

Closed cczhu closed 3 years ago

cczhu commented 3 years ago

As mentioned in #374 (currently hacked to be ignored by #375), Miovision began sending bicycle approach volumes to us in December. Bike turning-movement counts continue to be sent to us as of 2021-05-21, but will eventually be phased out in favour of only providing approach volumes. Vehicle TMCs and pedestrian crosswalk movements will be unaffected.

To do:

Questions:

cczhu commented 3 years ago

Miovision returns bike approach volumes as:

{'timestamp': '2021-05-04T12:01:00.000-04:00',
 'class': 'Bicycle',
 'entrance': 'E',
 'exit': 'UNDEFINED',
 'qty': 1}

and departure volumes as:

{'timestamp': '2021-05-04T12:06:00.000-04:00',
 'class': 'Bicycle',
 'entrance': 'UNDEFINED',
 'exit': 'N',
 'qty': 1}

Currently in the API, the leg is defined solely by item['entrance'], so we at least need to generalize extracting leg and movement so that item['exit'] is used if item['entrance'] is undefined. This suggests to me a more general function that returns both leg and movement:

from collections import namedtuple
import datetime
import sys

EEPair = namedtuple('EEPair', ['entrance', 'exit'])

intersection_mvmts_no_uturn = {
    EEPair(entrance='N', exit='S'): '1',
    EEPair(entrance='S', exit='N'): '1',
    EEPair(entrance='W', exit='E'): '1',
    EEPair(entrance='E', exit='W'): '1',
    EEPair(entrance='S', exit='W'): '2',
    EEPair(entrance='N', exit='E'): '2',
    EEPair(entrance='W', exit='N'): '2',
    EEPair(entrance='E', exit='S'): '2',
    EEPair(entrance='S', exit='E'): '3',
    EEPair(entrance='E', exit='N'): '3',
    EEPair(entrance='N', exit='W'): '3',
    EEPair(entrance='W', exit='S'): '3',    
}

def get_leg_and_movement(entrance, exit):
    if exit == 'UNDEFINED':
        return (entrance, '7')
    elif entrance == 'UNDEFINED':
        return (exit, '8')
    elif entrance == exit:
        return (entrance, '4')
    movement = intersection_mvmts_no_uturn[
        EEPair(entrance=entrance, exit=exit)]
    return (entrance, movement)

where I've also replaced the long if-then block with a more Pythonic dictionary. A call to this function would replace lines 184 and 185 here. Movement 7 is approach, and 8 is departure. Assigning these new movements has precedence with bikes and peds at crosswalks.

Assigning class (L182) also must be done differently, since it depends on both the vehicle class and type of movement (before, we had a separate API hook for crosswalk data so did not need to be worried about this). L180-192 should be partly rewritten to accommodate this.

cczhu commented 3 years ago

Proposal, replace get_intersection_tmc and get_crosswalk_tmc (lines 166-256 and associated functions (like get_movement) with this class:

from collections import namedtuple
import datetime
import sys

# Entrance-exit pairs.
EEPair = namedtuple('EEPair', ['entrance', 'exit'])

class MiovTMCPuller:

    header = {'Content-Type': 'application/json',
              'Authorization': ''}

    tmc_template = itmc.url + "{int_id1}" + itmc.tmc_endpoint
    ped_template = itmc.url + "{int_id1}" + itmc.ped_endpoint

    roaduser_class = {
        'Light': '1',
        'BicycleTMC': '2',
        'Bus': '3',
        'SingleUnitTruck': '4',
        'ArticulatedTruck': '5',
        'WorkVan': '8',
        'MotorizedVehicle': '9',
        'Bicycle': '10'
    }

    intersection_mvmts_no_uturn = {
        EEPair(entrance='N', exit='S'): '1',
        EEPair(entrance='S', exit='N'): '1',
        EEPair(entrance='W', exit='E'): '1',
        EEPair(entrance='E', exit='W'): '1',
        EEPair(entrance='S', exit='W'): '2',
        EEPair(entrance='N', exit='E'): '2',
        EEPair(entrance='W', exit='N'): '2',
        EEPair(entrance='E', exit='S'): '2',
        EEPair(entrance='S', exit='E'): '3',
        EEPair(entrance='E', exit='N'): '3',
        EEPair(entrance='N', exit='W'): '3',
        EEPair(entrance='W', exit='S'): '3',    
    }

    crosswalkuser_class = {
        'Pedestrian': '6',
        'Bicycle': '7'
    }

    crosswalk_mvmts = {
        'CW': '5',
        'CCW': '6'
    }

    def __init__(self, session, logger, int_id1, int_uid, key):
        self._session = session
        self._logger = logger
        self.headers['Authorization'] = key
        # Verbose design, but this is so eventually we can migrate
        # get_intersection_info into this and run the entire loop
        # of intersections inside this class.
        self.int_id1
        self.intersection_uid = int_uid

    def get_response(self, apitype, start_time, end_iteration_time):

        params = {'endTime': (end_iteration_time
                              - datetime.timedelta(milliseconds=1)),
                  'startTime': start_time}

        if apitype == 'crosswalk':
            request_url = self.ped_template.format(int_id1=self.int_id1)
        elif apitype == 'tmc':
            request_url = self.tmc_template.format(int_id1=self.int_id1)    
        else:
            raise ValueError("apitype must be either 'tmc' or 'crosswalk'!")

        response = self._session.get(
            request_url, params=params, headers=self.headers,
            proxies=self._session.proxies)

        # Only return if we get a success response code.
        if response.status_code == 200:
            return response
        elif response.status_code == 404:
            error = json.loads(response.content.decode('utf-8'))
            self._logger.error("Problem with ped call for intersection %s",
                               intersection_id1)
            self._logger.error(error['error'])
            raise NotFoundError
        elif response.status_code == 400:
            self._logger.critical(("Bad request error when pulling ped data for "
                                   "intersection {0} from {1} until {2}")
                                  .format(intersection_id1, start_time,
                                          end_iteration_time))
            error = json.loads(response.content.decode('utf-8'))
            self._logger.critical(error['error'])
            sys.exit(5)
        elif response.status_code == 504:
            raise itmc.TimeoutException('Error' + str(response.status_code))
        elif response.status_code == 500:
            raise itmc.ServerException('Error' + str(response.status_code))

        # If code isn't handled in the block above, throw a general error.
        self._logger.critical('Unknown error pulling ped data for intersection %s',
                              intersection_id1)
        raise itmc.MiovisionAPIException('Error' + str(response.status_code))

    def get_road_class(self, row):
        is_approach = ((row['entrance'] == 'UNDEFINED')
                       or (row['exit'] == 'UNDEFINED'))
        # Second check isn't strictly necessary but better safe than sorry.
        if not is_approach and (row['class'] == 'Bicycle'):
            ru_class = 'BicycleTMC'
        else:
            ru_class = row['class']

        try:
            return self.roaduser_class[ru_class]
        except KeyError:
            raise ValueError("vehicle class {0} not recognized!"
                             .format(veh_class))

    def get_road_leg_and_movement(self, row):
        if row['exit'] == 'UNDEFINED':
            return (row['entrance'], '7')
        elif row['entrance'] == 'UNDEFINED':
            return (row['exit'], '8')
        elif row['entrance'] == row['exit']:
            return (row['entrance'], '4')
        movement = self.intersection_mvmts_no_uturn[
            EEPair(entrance=row['entrance'], exit=row['exit'])]
        return (row['entrance'], movement)

    def get_crosswalk_class(self, row):
        try:
            return self.crosswalkuser_class[row['class']]
        except KeyError:
            raise ValueError("crosswalk class {0} not recognized!"
                             .format(row['class']))

    def get_crosswalk_movement(self, row):
        try:
            return self.crosswalk_mvmts[row['direction']]
        except KeyError:
            raise ValueError("crosswalk movement {0} not recognized!"
                             .format(row['direction']))

    def process_tmc_row(self, row):

        classification = self.get_road_class(row)
        (leg, movement) = self.get_road_leg_and_movement(row)

        # Return time, classification_uid, leg, movement, volume.
        return (row['timestamp'], classification, leg, movement, row['qty'])

    def process_crosswalk_row(self, row):

        classification = self.get_crosswalk_class(row)
        movement = self.get_crosswalk_movement(row)

        # Return time, classification_uid, leg, movement, volume.
        return (row['timestamp'], classification, row['crosswalkSide'], movement, row['qty'])

    def process_response(self, response, apitype):
        data = json.loads(response.content.decode('utf-8'))
        if apitype = 'crosswalk':
            return [(self.intersection_uid, ) + self.process_ped_row(row) for row in data]
        return [(self.intersection_uid, ) + self.process_tmc_row(row) for row in data]

    def get_intersection(self, start_time, end_iteration_time):
        response_tmc = self.get_response('tmc', start_time, end_iteration_time)
        table_veh = self.process_response(response_tmc, 'tmc')
        response_crosswalk = self.get_response('crosswalk', start_time, end_iteration_time)
        table_ped = self.process_response(response_crosswalk, 'crosswalk')

        return table_veh, table_ped

We can then modify L411-417 with a class instantiation and single method call.

Advantages of doing something like this (the class itself is just a prototype and probably needs some refactoring):

cczhu commented 3 years ago

Once bike approach counts are in miovision_api.volumes, how should we aggregate them?

15-minute aggregation and handling of unacceptable gaps is done through aggregate_15_min_tmc. We don't want to duplicate the generation of zero-padded 15-minute movement bins and aggregating 1-minute volumes to these bins, so we can split lines 23 - 52 to the following:

This only works if movements 7 and 8 are also added to miovision_api.intersection_movements for all intersections.

Now that we have a 15-minute TMC table and a 15-minute bikes only ATR table, we use aggregate-volumes_15min to create a combined 15-minute ATR-style dataset. We can keep the first CTE as is (or explicitly remove class 2 counts from it, depending on whether we think having both class 2 and 10 counts in this table will be very confusing for users). We then create a new CTE that transforms the contents of miovision_api.volumes_15min_bikeatr to the same format as miovision_api.volumes_15min. Finally, we union this CTE's output with transformed, and insert the result using insert_atr as before.

The end result is the introduction of a new classification_uid, 10, after 2020-12-15 or so. Once class 2 is no longer transmitted, we may need to alter the query to make sure we don't store a bunch of zeros in the 15-minute bins.

cczhu commented 3 years ago

To summarize the previous 3 comments, here are the changes to make to the pipeline to incorporate bike ATR counts.

  1. Create a new class of road user, classification_uid = 10, to indicate bike approach volumes (similar to how we store bikes at crosswalks using class 7, though it's very difficult to double count the same bike as both class 2 and 7).
  2. Following discussion here, modify get_intersection_tmc to distinguish class 2 and 10 apart. For class 10, insert approach volumes with movement_uid = 7, and departing volumes with 8. I think it's a good time to refactor part of the intersection_tmc.py into a class, but at minimum we'll need to change the functions dealing with class and movement, and revise get_intersection_tmc.
  3. Insert relevant rows in miovision_api.classifications, miovision_api.movements, and miovision_api.intersection_movements to include class 10 and movements 7 and 8.
  4. Rewrite (as described here) the 15 minute TMC aggregation to insert class 10 movements into miovision_api.volumes_15min_bikeatr and all other movements into miovision_api.volumes_15min_tmc. Since we can't convert approach volumes back into TMCs, it's best not to store the bike approach volumes in the same aggregate table (though I'm okay with storing it in the raw table since we store crosswalk and turning movements in the same table).
  5. Rewrite the 15-minute ATR aggregation to union the transformed version of miovision_api.volumes_15min_tmc with that of miovision_api.volumes_15min_bikeatr.

We still need to answer the following:

cczhu commented 3 years ago

In discussion with @radumas and @aharpalaniTO

We should:

In general, design defensively against any users from aggregating turning and approach volumes together without understanding they are different (eg. to prevent before-after studies from having spurious results).

cczhu commented 3 years ago

Annoyance:

image

Seems like there may be some intersections with no class 10 counts at all. Not sure if this means they aren't set up yet or not. I ran a quick check to see how many cyclist TMC (class 2) and approach (10) counts there are between 16 - 19:00 on 2021-05-25 for existing intersections in a Jupyter Notebook (to be committed soon). Got:

Intersection 1 (Adelaide / Bathurst) has 220 class 2 and 0 class 10 counts
Intersection 2 (Adelaide / Spadina) has 107 class 2 and 501 class 10 counts
Intersection 3 (Adelaide / Bay) has 260 class 2 and 232 class 10 counts
Intersection 4 (Adelaide / Jarvis) has 219 class 2 and 0 class 10 counts
Intersection 5 (Front / Bathurst) has 245 class 2 and 0 class 10 counts
Intersection 6 (Front / Spadina) has 158 class 2 and 62 class 10 counts
Intersection 7 (Front / Bay) has 307 class 2 and 165 class 10 counts
Intersection 8 (Front / Jarvis) has 254 class 2 and 230 class 10 counts
Intersection 9 (King / Strachan) has 0 class 2 and 0 class 10 counts
Intersection 10 (King / Bathurst) has 537 class 2 and 194 class 10 counts
Intersection 11 (King / Portland) has 0 class 2 and 0 class 10 counts
Intersection 12 (King / Spadina) has 508 class 2 and 236 class 10 counts
Intersection 13 (King / Peter) has 0 class 2 and 0 class 10 counts
Intersection 14 (King / Simcoe) has 0 class 2 and 0 class 10 counts
Intersection 15 (King / University) has 346 class 2 and 172 class 10 counts
Intersection 16 (King / York) has 0 class 2 and 0 class 10 counts
Intersection 17 (King / Bay) has 352 class 2 and 407 class 10 counts
Intersection 18 (King / Yonge) has 387 class 2 and 513 class 10 counts
Intersection 19 (King / Church) has 0 class 2 and 0 class 10 counts
Intersection 20 (King / Jarvis) has 308 class 2 and 151 class 10 counts
Intersection 21 (King / Sherbourne) has 431 class 2 and 472 class 10 counts
Intersection 22 (Queen / Bathurst) has 526 class 2 and 481 class 10 counts
Intersection 23 (Queen / Spadina) has 630 class 2 and 290 class 10 counts
Intersection 24 (Queen / Bay) has 469 class 2 and 226 class 10 counts
Intersection 25 (Queen / Jarvis) has 298 class 2 and 203 class 10 counts
Intersection 26 (Richmond / Bathurst) has 299 class 2 and 559 class 10 counts
Intersection 27 (Richmond / Spadina) has 495 class 2 and 376 class 10 counts
Intersection 28 (Richmond / Bay) has 374 class 2 and 589 class 10 counts
Intersection 29 (Richmond / Jarvis) has 216 class 2 and 146 class 10 counts
Intersection 30 (Wellington / Blue Jays) has 0 class 2 and 0 class 10 counts
Intersection 31 (Wellington / Bay) has 0 class 2 and 0 class 10 counts
Intersection 32 (King / John) has 0 class 2 and 0 class 10 counts
Intersection 33 (Bloor / Avenue) has 583 class 2 and 250 class 10 counts
Intersection 34 (Bloor / Sherbourne) has 544 class 2 and 118 class 10 counts
Intersection 35 (Bloor / Shaw) has 809 class 2 and 225 class 10 counts
Intersection 36 (Brimley / Eglinton) has 0 class 2 and 0 class 10 counts
Intersection 37 (Danforth / Jones) has 421 class 2 and 321 class 10 counts
Intersection 38 (Danforth / Woodbine) has 433 class 2 and 586 class 10 counts
Intersection 39 (Danforth / Dawes) has 251 class 2 and 183 class 10 counts
Intersection 40 (Dundas / River) has 397 class 2 and 117 class 10 counts
Intersection 41 (Bloor / Bay) has 518 class 2 and 382 class 10 counts
Intersection 42 (College / University) has 746 class 2 and 412 class 10 counts
Intersection 43 (Danforth / Logan) has 383 class 2 and 0 class 10 counts
Intersection 44 (Dundas / Bloor) has 442 class 2 and 284 class 10 counts
Intersection 45 (Eglinton / Kingston) has 11 class 2 and 8 class 10 counts
Intersection 46 (Ellesmere / Morningside) has 0 class 2 and 2 class 10 counts
Intersection 47 (Greenwood / Danforth) has 491 class 2 and 336 class 10 counts
Intersection 48 (Huntingwood / Warden) has 48 class 2 and 16 class 10 counts
Intersection 49 (Lawrence / Brimley) has 3 class 2 and 15 class 10 counts
Intersection 50 (Morningside / Kingston) has 18 class 2 and 3 class 10 counts
Intersection 51 (Runnymede / Bloor) has 299 class 2 and 209 class 10 counts
Intersection 52 (Sheppard / Wilmington) has 0 class 2 and 30 class 10 counts
Intersection 53 (St. Clair / Brimley) has 17 class 2 and 8 class 10 counts
Intersection 54 (Thorncliffe Park / Overlea) has 65 class 2 and 16 class 10 counts
Intersection 55 (Danforth / Broadview) has 507 class 2 and 416 class 10 counts
Intersection 56 (Don Mills / Overlea) has 117 class 2 and 16 class 10 counts
Intersection 57 (Queen / University) has 458 class 2 and 208 class 10 counts

So that's:

Intersection 1 (Adelaide / Bathurst) has 220 class 2 and 0 class 10 counts
Intersection 4 (Adelaide / Jarvis) has 219 class 2 and 0 class 10 counts
Intersection 5 (Front / Bathurst) has 245 class 2 and 0 class 10 counts
Intersection 43 (Danforth / Logan) has 383 class 2 and 0 class 10 counts

that have hundreds of class 2 but zero class 10 counts. @aharpalaniTO will be asking Miovision about when bike approach counts came online for different intersections.

cczhu commented 3 years ago

Quick comparison of TMC bikes vs. approach volume bikes (divided by 2 so as not to double-count):

Bloor / Bay

image

Queen / Bathurst

image

There's a pretty big deficit in approach volumes compared to turning movement ones!

I double checked the raw Bloor / Bay numbers by directly pulling from the API and reading the output into Pandas, and got the same total numbers, suggesting that this is not an API code issue, but an actual data issue from Miovision.

cczhu commented 3 years ago

Based on recent discussion with @radumas and @aharpalaniTO , we revised the plan from above:

  1. Create a new class of road user, classification_uid = 10, to indicate bike approach volumes (similar to how we store bikes at crosswalks using class 7, though it's very difficult to double count the same bike as both class 2 and 7).
  2. Following discussion here, modify get_intersection_tmc to distinguish class 2 and 10 apart. For class 10, insert approach volumes with movement_uid = 7, and departing volumes with 8. I think it's a good time to refactor part of the intersection_tmc.py into a class, but at minimum we'll need to change the functions dealing with class and movement, and revise get_intersection_tmc.
  3. Insert relevant rows in miovision_api.classifications, miovision_api.movements, and miovision_api.intersection_movements to include class 10 and movements 7 and 8.
  4. Change the name of volumes_15min_tmc to volumes_15min_mvt, which now includes all movements including bike approaches, bikes at crosswalks and pedestrians. Store all volumes in that table, but create a new miovision_api.volumes_15min_tmc view of that table that only includes TMCs.
cczhu commented 3 years ago

Successfully renamed all the _tmc_ tables, columns, keys, indexes, etc. in czhu to _mvt_ using pgAdmin to see if it can be done.

Now ready for review.

radumas commented 3 years ago

In light of the recommendation not to use approach exit volumes: should we be excluding approach exit volumes from the final 15 minute table? Or from a reporting table/view (if those are still used)?

cczhu commented 3 years ago

Agreed that we should just remove (and thoroughly document why we removed) movement_uid = 8 for now. I don't think it changes any of the aggregation function methods.

I'm sort of leaning toward not including movement_uid = 8 even in the 1-minute aggregation table, just in case someone six months down the line suddenly realizes we haven't been aggregating those volumes and backfills them. It might depend on whether Miovision will ever support departure volumes.

radumas commented 3 years ago

I think if there's a concise list of what we aggregate and why in the README (and this is linked in table comments) we should avoid future confusion.

Removing things from the "raw" table makes it harder to do retrospective analyses like identifying movements that should be aggregated.

cczhu commented 3 years ago

I suppose we don't bother spurious detections of other kinds (cars doing impossible manoeuvres, vehicle approach volumes when only bike approach volumes should exist, etc.), so agreed - keep it in the raw tables, don't aggregate.

cczhu commented 3 years ago

Now that the pipeline has been merged and is in production, a few sensibility checks of the 2021-07-11 data:

SELECT movement_uid,
       COUNT(*) n_samples
FROM miovision_api.volumes
WHERE datetime_bin BETWEEN '2021-07-11 00:00:00' AND '2021-07-11 23:59:00'
AND classification_uid = 10
GROUP BY movement_uid;

image

Good, we have on average around 400 bike approach detections per intersection in the last day.

SELECT movement_uid,
       COUNT(*) n_samples
FROM miovision_api.volumes_15min_mvt
WHERE datetime_bin BETWEEN '2021-07-11 00:00:00' AND '2021-07-11 23:59:00'
AND classification_uid = 10
GROUP BY movement_uid;

image

Only movement 7 (approach) volumes are available at the 15-minute movement aggregation level.

SELECT leg,
       dir,
       count(*) n_samples
FROM miovision_api.volumes_15min
WHERE datetime_bin BETWEEN '2021-07-11 00:00:00' AND '2021-07-11 23:59:00'
AND classification_uid = 10
GROUP BY leg, dir;

image

Only approach volumes are listed here (as expected), and they sum up to 20056.

Have a look at a single 15-minute bin:

SELECT *
FROM miovision_api.volumes_15min_mvt
WHERE volume_15min_mvt_uid = 349162657;

image

SELECT *
FROM miovision_api.volumes
WHERE volume_15min_mvt_uid = 349162701;

image

The rows of volume in miovision_api.volumes add up to the 15-minute aggregated row in miovision_api.volumes_15min_mvt with the same mvt_uid.

Since even in volumes_15min_mvt the bike approach volumes are approach volumes, summing up volumes by intersection and leg should yield identical numbers from both volumes_15min_mvt and volumes_15min. To that end,

WITH mvt_vols AS (
    SELECT intersection_uid,
           leg,
           SUM(volume) total_vol
    FROM miovision_api.volumes_15min_mvt
    WHERE datetime_bin BETWEEN '2021-07-11 00:00:00' AND '2021-07-11 23:59:00'
         AND classification_uid = 10
    GROUP BY intersection_uid, leg
), vols AS (
    SELECT intersection_uid,
           leg,
           SUM(volume) total_vol
    FROM miovision_api.volumes_15min
    WHERE datetime_bin BETWEEN '2021-07-11 00:00:00' AND '2021-07-11 23:59:00'
         AND classification_uid = 10
    GROUP BY intersection_uid, leg
)
SELECT *
FROM mvt_vols a
FULL OUTER JOIN vols b USING (intersection_uid, leg)
WHERE a.total_vol != b.total_vol;

returns no rows.