Non phylo linear model - Githubissues

coreytcallaghan commented 6 years ago

I got this weird error when running remake::make(), but it seems to work alright and the pdf is created. capture

wcornwell commented 6 years ago

Ah I see it's a naming thing. It's supposed to create reports/non_phylo_report.pdf and it makes reports/non_phylo_model.pdf

coreytcallaghan commented 6 years ago

Gotcha. I can fix it.

wcornwell commented 6 years ago

BTW we have some co-linearity issues with variables like brain size and body size.

Make our statistical lives a lot easier to pick a smaller set to start with. Much easier statistically and then also to write about in the end.

coreytcallaghan commented 6 years ago

See #27, but I'm sold on dropping brain size. Just wrote it into the methods already!!

coreytcallaghan commented 6 years ago

Also, I added weights to the linear model. Tell me if you think this is crazy, and I'm not entirely sure how the math works behind the scenes, but essentially I think we are just weighting those species with the most records and most unique localities higher, as those median measures should be the most accurate. Seems intuitive, but not sure if I specified it correctly or not, etc.

wcornwell commented 6 years ago

i like the idea of weighting (more about weights here: http://environmentalcomputing.net/meta-analysis-2/ )

Am still worried about co-linearity in the predictors (esp the categorical one). Might need to be plan the analysis steps a bit more (rather than putting them all into one big model--model will def be unstable).

Also, already have a ton of results so have to organize the analysis so that the interpretation is not too crazy complicated--for both us and the readers.

what about starting simple with phylo/non-phylo and just 6 predictors:

mean_body_size
clutch_size
feeding_habitat_generalism
breeding_habitat_generalism
diet_generalism
nest_generalism.

Then we use the other yes/no variables in a subsequent step to help interpret the results of the first analysis?

coreytcallaghan commented 6 years ago

I'm not actually worried about multicollinearity... unless I'm doing something wrong, it looks like VIFs are all <2, which seems pretty good!

This is using car::vif, and looking at the generalized vif column.

capture

From the help page: "If any terms in an unweighted linear model have more than 1 df, then generalized variance-inflation factors (Fox and Monette, 1992) are calculated. These are interpretable as the inflation in size of the confidence ellipse or ellipsoid for the coefficients of the term in comparison with what would be obtained for orthogonal data.

The generalized vifs are invariant with respect to the coding of the terms in the model (as long as the subspace of the columns of the model matrix pertaining to each term is invariant). To adjust for the dimension of the confidence ellipsoid, the function also prints GVIF^[1/(2*df)] where df is the degrees of freedom associated with the term.

Through a further generalization, the implementation here is applicable as well to other sorts of models, in particular weighted linear models, generalized linear models, and mixed-effects models."

coreytcallaghan commented 6 years ago

That being said, I do agree that we have to think how to best present the results in an intuitive manner. I'm happy with putting stuff in supp files (seems to be all the rage these days...).

I am still leaning towards some sort of model selection procedure and thereby saying which predictors were the most important. I have been playing with leaps::regsubsets and it looks like the results match up really closely with MuMIn::step

coreytcallaghan commented 6 years ago

Also, theoretically, I'm semi-opposed to picking a smaller subset. I think our 'edge' on this paper (i.e., improvement over past studies) is multi-faceted.

1.) We have heaps more bird observation data

2.) Because of this we are using a continuous index

3.) Lastly, because of all this accessible data, we are able to look at all traits that have been looked at previously in the literature (or at least some variant of them). Thanks to people pre-compiling all the trait data.

While I agree some probably are more important than others, based on previous studies, I would find it difficult to choose which traits may be more important and thus automatically included in our initial set of predictors. What if the one study that looked at x was actually the best study but just lacked power? Essentially, I think one advantage we have is to say we treated each trait as having the same probability of being the best predictor of urbanness. But, if we can't do it statistically, then I'm happy to change my tune.

One thing we could do (I think, based on my reading of various model selection techniques) would be to force a number of variables into the models, so that brain size or body size are always included, etc.?

wcornwell commented 6 years ago

ok just let me know where you get to with the non-phylo analysis, and we'll figure out how to do the phylo stuff...kinda busy week for me.

wcornwell commented 6 years ago

seems like this is done?

coreytcallaghan / Oikos_oik.06158

Non phylo linear model #22