Open alex-mucci opened 3 years ago
Also, consider multiple regression. i.e. regression with 2 dependent variables.
Option 1 is not a good issue because the figures below show that there is drastic differences between the change in ridership and fares when you compare shared and private trips. These model would not be able to pickup on the changes because a FARE variable can not be included in the model due to its unavailability nationwide. Pooling shared and private trips changes the trend in the dependent variable without including the independent variable that most likely is causing the change.
Shared Fare:
Private Fare:
Combined Fare:
Shared Trips:
Private Trips:
Combined Trips:
Also, consider multiple regression. i.e. regression with 2 dependent variables.
I found something called multivariate regression, but it sounds equivalent to having separate models. Would this have the ability to estimate two coefficients for specific independent variables, while having one coefficient applied to both dependent variables for other independent variables?
By 2020 shared trips only consist of 12.5% of all ridehailing trips in Chicago. Whether or not the 12.5% trips are significant enough to keep determines whether option 3 is better than option two.
Changing the structure of the estimation file to be mode specific. Each row is the total number of shared or private trips occurring between census tract zone pairs during a month. The entity of the panel structure is origin-destination-mode and the time period is month-year. Then there is mode specific variables included in the model. For example, shared travel time is the average travel time of shared trips between a given census tract zone pairs during a given month for the shared entities and it is zero for the private entities.
I have the following options: