The data prep currently involves calculating 19 bioclim variables from monthly temperature and precipitation, then doing PCA to collapse those 19 variables to a number of PCs that explain 85% of variation in the data.
1) I'm not sure PCA is appropriate
2) I'm not sure PCA is necessary (i.e. does it actually improve the random forest model?)
3) Rather than arbitrarily choosing a threshold of 85%, I could tune() this value or tune() the number of components.
The data prep currently involves calculating 19 bioclim variables from monthly temperature and precipitation, then doing PCA to collapse those 19 variables to a number of PCs that explain 85% of variation in the data.
1) I'm not sure PCA is appropriate 2) I'm not sure PCA is necessary (i.e. does it actually improve the random forest model?) 3) Rather than arbitrarily choosing a threshold of 85%, I could
tune()
this value ortune()
the number of components.