High OTP Transit Trip Walk Distance and Travel Times

alex-mucci commented 3 years ago

After creating the statistics for the OTP transit trip data, I found that the max values seemed high with a walk distance of 3.5 miles and a travel time of 7 hours. I have dug into the data and found that only a few census tracts have these high values. The zones are shown below with a basemap. The basemap shows that some of the centroids of the census tract fall within an area that transit can not serve, such as a body of water. Other centroids are simply in areas where there is not any bus stops nearby.

alex-mucci commented 3 years ago

Initially, I suggest leaving the data how they are. I think the high walk distance and travel time is a reflection of the zones being poorly served by transit. I could move the centroid of the one zone from the body of water to the street where transit stops are, but I don't think it will change things much.

alex-mucci commented 3 years ago

After digging more into the data, certain months have much longer travel times than others. The travel times increase by a factor of 2 on average, so I do not think this is because of service changes. I think the CTA rail and METRA rail transit stops are being left out of the calculation somehow. I will be digging more into this later.

alex-mucci commented 3 years ago

There is something going wrong with the aggregation of the OTP transit data. There is two records for specific OD pairs for TODs in certain months. As mentioned above, there is also an issue with the average travel times between OD pairs for a given month being unreasonably different between months.

alex-mucci commented 3 years ago

The ridehailing travel times tend to be longer than the OTP auto travel times (free flow travel time). OTP transit travel times tend to be the longest. I feel comfortable using the OTP auto travel times to represent the "cost" of traveling between the two zones, because the OTP auto travel times are closer to google maps estimates when RH travel times are much higher/lower than OTP auto travel time. I also need to have a "cost" variable that is open-source and available for other cities.

alex-mucci commented 3 years ago

The RMSE of the difference between ridehailing travel times and OTP auto travel times is 7.24 minutes and the percent RMSE is 25%.

alex-mucci commented 3 years ago

The following "cases" will be checked to understand the reasonableness of the data:

Trips with estimated zero minute travel times (OTP calculation)
Trips with observed zero minute travel times (chicago RH data)
Trips with a longer free flow travel time than RH travel time
Trips with significantly longer RH travel time than free flow travel time

alex-mucci commented 3 years ago

Trips with estimated zero minute travel times (OTP Calculation):

The trips that have a zero OTP automobile travel time, or free flow travel time, are the trips that are within the same census tract. I assign the midpoint of the census tract to represent the entire census tract, so when a ridehail trip is within the same census tract the OTP calculation travels from and to the same point for a travel time of zero. The snip below shows that the records with zero travel times have the same origin and destination census tract.

gregerhardt commented 3 years ago

The standard method for intra-zonals is to use half the travel time to the nearest neighbor.

alex-mucci commented 3 years ago

Trips with observed zero minute travel times (Chicago Ridehailing Dataset):

There are 1,222 trips that have a travel time of zero in the Chicago ridehailing dataset. I can not confidently say what is going on here. My best guesses are either the trip was canceled or something went wrong when the data was masked.

Some of the trips even have a trip length but 0 travel time. Others have a trip length but no travel time. There is also an issue with the fare being zero when the trip length and travel time isn't. My plan is to drop all of the trips that have either a travel time of 0, a trip length of 0, or a fare of 0.

I'm not as comfortable dropping out the trips with a fare of 0. There is a possibility of the fares being zero because of the 10th ride being free (or whatever the promotion is), but there is not a way to confirm this. The promotions will also make it harder to predict fares for other cities. What are your thoughts on dropping them out and making sure I note the drop and explain in my paper that the reason behind the drop is to filter out promotional trips?

The drop will have the following effect:

Dropping trips with a travel time of zero = 1,222 trips dropped Dropping trips with a trip length of zero = 17.701 trips dropped Droppping trips with a fare of zero = 1.1 million trips dropped

There are a total of 147 million trips in the dataset and there is a possibility of overlap here because there is some trips that have a travel time of 0 and a trip length of 0.

gregerhardt commented 3 years ago

I agree with your assessment. Drop the 0 travel time and zero distance trips. Keep the 0 fare trips.

alex-mucci / TNC-Demand-Model

High OTP Transit Trip Walk Distance and Travel Times #13