Open alex-mucci opened 3 years ago
I think I should estimate a model with the most disaggregate data first and see what the results look like. The model might do a good job even with the high number of zeros. - Alex to do
After reading the email from Dr. Erhardt below, I should start with a poisson model because the ride-hailing use is count data. I have emailed him to get a license for stata and will be testing the data for overdispersion. The model structure will likely need to be tweaked because the data is skewed towards zero, but I will test for that skewness and cross that bridge when I get there.
I think this applies to several of you. Vedant in particular, I recommend this for your SF crash models. My general guidance is this.
More details are below. Please bookmark these, and if you’re using this in your thesis/dissertation, be able to explain and defend the choice. (For me, “The guy who wrote the econometrics textbook says its fine.” is good enough, but you may need to say something more meaningful when you write.
There is an issue with the o-d pairs that do not have ride-hail data. I can make the trip total zero for those pairs but it does not make sense to make variables like travel time zero. I can use travel times for the o-d pair in different months but there are o-d pairs without any ride-hail data for any months.
Should I drop out the o-d pairs without any ride-hail data for any months and fill in the average travel time of all months when it is missing for one month? Or use OTP free flow travel time instead of the observed travel time?
Here are some facts about the data:
There are two options for the tax model:
Drop the o-d pairs that do not have RH data.
Build cost and travel time models for the records missing data
The level of aggregation makes a big difference in the number of zeros in the dependent variable. I’m afraid that the high number of zeros could cause the model to find some weird relationships.