Ogle-lab / Pulse-meta-analysis

Pulse precipitation meta-analysis project
0 stars 0 forks source link

Boosted regression vs. glm for post-analysis #62

Open egreich opened 1 year ago

egreich commented 1 year ago

Multinomial categorical response variables are hard to interpret. Boosted regression trees are supposed to be more robust for categorical predictor variables, but are harder to interpret. GLMs are easier to interpret, but harder to apply to the type of data we are using in question 1.

Question 1: Family distribution needs to be multinomial, glm function does not work. Multinom function (https://stats.oarc.ucla.edu/r/dae/multinomial-logistic-regression/) seems to work ok.

Question 2 and 3: Numeric response variables, either boosted regression tree or glm with gaussian distribution is fine.

We need to decide which analysis to use, and then make figures as described in the figures folder README. Note that regardless of method used, results are the same. We should what is most convenient for interpretability.

egreich commented 1 year ago

Honestly, maybe we should also just get rid of unitDuration as a predictor variable because our data is so skewed towards instantaneous I'm skeptical of it. This would make our three most important predictor variables numerical.