Open alex-mucci opened 4 years ago
I am randomly assigning the suppressed trips to the census tracts within the community area. The random assignment makes sure the census tracts with low RH demand are included and assigned some of the trips.
After talking with a researcher from Northwestern who used the Chicago data, I am not sure the random assignment strategy is the best. I did not like assigning the suppressed trips based on non-suppressed trips because the suppressed trips represent low-use, not high-use. I was originally thinking of the use being low in a specific area, but there is also a temporal component.
The overnight trips are likely suppressed because use is lower during that TOD, and it is reasonable to think that the trips would follow a similar spatial pattern that the non-suppressed trips have. Nearly a third of the suppressed trips are during the overnight period.
Does it make sense to assign the overnight TOD suppressed trips based on the non-suppressed overnight trip totals?
Then the non-overnight suppressed trips will still be randomly assigned because I feel there is a higher likelihood of the trips being in low demand areas during those TODs.
@gregerhardt
I will also submit a FOIA request to get the unsuppressed trips. The researcher from Northwestern said he was unsuccessful, but it is worth a try.
Below is maps of the average weekday pickups/dropoffs that are suppressed. This is the total number of suppressed pickups/dropoffs to/from each census tract throughout the study period (nov 2018 - mar 2020). I can make more disaggregate maps but I think these already shows the suppressed trips are concentrated within certain areas.
I don't follow. I thought the suppressed trips were only available for community areas? Don't we need other trips to compare against? Why is the loop so prominent?
Below is a monthly breakdown of suppressed trips. Nothing significant here because the share of trips being suppressed stays around 25% for all months. These trip totals do not match the graphs I showed in my defense ppt because the aggregation is different. The graphs rolled up average weekday trips between census tracts. This table rolls up average weekday trips between community areas because the suppressed trips are not assigned census tracts.
Below is a map of the average weekday suppressed pickups for November 2018 at a community area resolution. There is more randomness to the map but there still seems to be a concentration of suppressed trips near the downtown area. I will email you a link to the folder that contains the maps for the other months. Things do change slightly between months. I will also create maps visualizing the percent of trips that are suppressed. The totals are a starting point to see if they follow the same spatial pattern as the unsuppressed pickups/dropoffs.
Below is the hourly breakdown of the suppressed trips. Looks like the higher shares of trips being suppressed is concentrated during the overnight TOD.
Below are the maps visualizing the share of all trips that are suppressed.
!!
Should the External Trips Be Included?
Currently the model estimation file only includes internal unsuppressed and suppressed trips. I leave off all of the internal-external and external-external trips. The internal-external trips consist of 14% of all trips and the external-external trips are less than 1%. I feel confident that dropping the external-external trips is okay, but dropping the 14% of internal-external trips strikes me as naïve.
The unsuppressed internal-external trips have the pickup and drop-off census tract locations known. All of the data variables except OTP transit travel time can be assigned currently. OTP transit travel times were only imputed for Chicago census tracts. Adding in the census tracts from the dataset that are outside of Chicago would take a good amount of time. The suppressed internal-external trips do not have any spatial geography for the external trip end so data can't be assigned to that trip end. The current structure of the model estimation file forces the suppressed internal-external trips to drop out. Roughly half of the internal-external trips are unsuppressed, so excluding them would cause 7% of trips to drop out.
A trip record is suppressed from a census tract to a community area when there is only 2 trips within a 15 minute window for a given O-D census tract pair. The table below shows a breakdown of the ride-hailing data with 20% of the data being suppressed that will need to be assigned. There is also 12% of trip records with one trip end outside of Chicago, which will drop out because the model will only include intra-Chicago ride-hailing trips.
The suppressed trips will need to be assigned to census tracts. I first thought to assign them based on the ride-hailing data that is not suppressed, but that will assign majority of trips to census tracts with high ridership. The suppressed records are more likely to be to/from census tracts with low ridership because they are suppressed when ridership is low enough to cause privacy issues. I then thought to use something like pop, emp, or area to assign the trips, but those will be included as variables in the model and will bias the model. There is also an issue with the O-D pairs that are missing data not having a chance of being assigned the trips. I have chosen to randomly assign the trips to census tracts within the community area O-D pair. The random number generator will keep the model unbias and will give the low ridership O-D pairs an equal chance to be assigned the trip totals.
Summary of Ridehailing Data for O-D Location Issues.xlsx