Open egreich opened 1 year ago
Honestly, maybe we should also just get rid of unitDuration as a predictor variable because our data is so skewed towards instantaneous I'm skeptical of it. This would make our three most important predictor variables numerical.
Multinomial categorical response variables are hard to interpret. Boosted regression trees are supposed to be more robust for categorical predictor variables, but are harder to interpret. GLMs are easier to interpret, but harder to apply to the type of data we are using in question 1.
Question 1: Family distribution needs to be multinomial, glm function does not work. Multinom function (https://stats.oarc.ucla.edu/r/dae/multinomial-logistic-regression/) seems to work ok.
Question 2 and 3: Numeric response variables, either boosted regression tree or glm with gaussian distribution is fine.
We need to decide which analysis to use, and then make figures as described in the figures folder README. Note that regardless of method used, results are the same. We should what is most convenient for interpretability.