cct-datascience / setaria-predict

Using ED2 ensembles to train a spatially explicit model
0 stars 0 forks source link

step_pca() #10

Closed Aariq closed 10 months ago

Aariq commented 1 year ago

The data prep currently involves calculating 19 bioclim variables from monthly temperature and precipitation, then doing PCA to collapse those 19 variables to a number of PCs that explain 85% of variation in the data.

1) I'm not sure PCA is appropriate 2) I'm not sure PCA is necessary (i.e. does it actually improve the random forest model?) 3) Rather than arbitrarily choosing a threshold of 85%, I could tune() this value or tune() the number of components.