Road-map - Githubissues

dpzhang commented 7 years ago

Research Question: Are there any differences in driving patterns between high-income taxi drivers and low-income drivers before and after 2016?

How to define high-income and low-come driver?
- classify them by averaged daily income
How to quantify driving pattern?
- aggregate all trips by census tract
- condition on pick up and drop off location (census tracts) and condition on time (hourly), we want to compare the actual distance (velocity?)
- how long does it take, on average, to pick a passenger between high-income and low-income drivers?

What Prof suggested:
- Case by case analysis: Does the high income drivers be able to make extra money from similar trip at the same time of the day?
- Is it that every trip high-income drivers can get more money? Or it is from some specific locations those high-income drivers can earn more money ?
  - hypothesis: high income drivers is more willing to go to bad neighborhoods? Do they abuse people from bad neighborhoods?
- Airport: more chance to get ripped off?
- Southern neighborhoods might be more attractive?

dpzhang commented 7 years ago

Classify high-income and low-income drivers
- Based on medallion ID, we can plot a distribution of average yearly income
  - sum all trip fares based on unique taxi ID within 1 year for each driver
  - divide the sum by the number of years each unique taxi operated from 2013-2017
    - Why weighted annual income? There are drivers who drive more trips while other drivers drive less trips within the same time-span, so look at the weighted annual income could be less biased.
  - plot averaged daily income to get a sense of the distribution
  - classify and label high-income and low-income drivers from that distribution
- Useful statistic: Weighted averaged fare a driver earned per year

dpzhang commented 7 years ago

How to quantify driving pattern in general?
- Categorize dates into three different levels: weekdays, weekends, and holiday
- Aggregate all taxi trips by 801 Chicago census tract
  - Taxi Trip Dataset: Pickup Census Tract and Drop-off Census Tract
  - Census Tract Boundary File: statefp10+countyfp10+tractce10
  - Some census tract might not even have any taxi pickups in four years, in this case, we need to remove those census tracts.
- Conditioning all trips by time
  - each temporal unit to be 3 hrs (6-9, 9-12, 12-15, 15-18, 18-21, 21-0, 0-3, 3-6)
- Conditioning all trip flows by 9 regions:
  - North to East, South, West
  - South to North, East, West
  - East to North, South, West
  - West to North,South, East
- After conditioning each trip by spatial unit and temporal unit, we need to study the flow from CT1 to CT2 or from CT2 to CT1
  - Comparing driving distance from CT to CT
    - Challenge: the geographical sizes of each census tract vary, so every trip from CT1 to CT2 might have large variation in distance per se, so need to find a way to standardize
    - Solution: compute a statistic by dividing actual trip miles by distance on map
    - Interpretation: we want to see for every actual mile of the trip, what is the number of extra miles that drivers decide to take, and what is the difference of those extra miles between high-income and low-income drivers?
  - Comparing driving velocity from CT to CT
    - Question: Using the statistic we computed for distance, we want combine it with velocity.
    - Why?: For example, high-income drivers tend to drive a longer distance from CT1 to CT2 comparing with that of low-income drivers? Is it because high-income drivers would tend to take longer route to avoid congested roads, but, at the same time, they would be able to drive faster so faster velocity would be able to compensate the longer route taken?

dpzhang commented 7 years ago

More detailed and specific study of driving pattern?
- Hypothesis: low-income drivers, do they just think they will earn more by staying in downtown, or staying in places where is more populous, while high-income drivers willing to go to the outskirt neighborhoods where trips are more likely to be longer?
- By looking at the pick-up and drop-off locations of high-income drivers, we want to get a sense of which census tracts do these drivers typically visit. If is, are there any characteristics among those neighborhoods in common?
- How do we quantify "good" or "bad" neighborhoods?
  - crime rate？
  - averaged income?
  - black/hispanic population?

dpzhang commented 7 years ago

8 Variables need to add to the raw dataset:

Region:
- In the census track shapefile, there are community id maps on each unique census track.
- The Chicago 77 is also classified by 9 different regions:
  - classify community by regions
  - classify census tracts by community region
Absolute Distance from pickup coordinate to drop-off coordinate
Ratio of real path length over shortest path length (RRSL)
Absolute Velocity: Absolute Distance / Trip Duration
Relative Velocity: Relative Distance / Trip Duration
Ratio of real velocity over relative velocity (RRVV)
Time Period: 8 levels as classified above
Day: Indication of if weekday, weekend, or holiday

ningyin-xu commented 7 years ago

Feedbacks: What if for some particular trips only high income drivers doing the trips but not the low income drivers?

People who work for tips: Tip variable correlated with people from low-income neighborhood?

dpzhang / Project_AHDA

Road-map #1