Open dill opened 7 years ago
Thanks, seeing this sort of folklore is really valuable.
High-dimensional interactions are a mess (curse of dimensionality, interpretation/plotting trouble, etc.), and become harder to justify. I mean, if altitude affects the temp/precip interaction (creating a 3-way interaction), do we really need to modulate that interaction according to a fourth variable? If that 4-way interaction is important, our science is screwed anyway because there's no way we'll measure it accurately.
So I idea of adding up low-dimensional smooths, as opposed compared with the radial kernels commonly used in GPs where everything is a P-way interaction. GPs can be fit the same way (adding covariances from low-dimensional kernels instead of multiplying them), but I don't think I've ever seen anyone do that outside of a tutorial.
The combinatorics of low-but-greater-than-one-dimensional smooths are also ugly. I like the idea behind factorization machines, which is basically to use the first k singular vectors of the matrix you'd get from interacting everything with everything. I don't see anything stopping someone from putting nonlinear basis functions inside of it.
Here's one example of a someone using additive 1D GPs for interpretation: https://tspace.library.utoronto.ca/bitstream/1807/75420/1/cjfas-2016-0008.pdf (plus a high-dimensional GP to mop up other interactions.) *
I think that's the only one I've seen. I think the uncommonness of this approach is just due to the lack of familiarity with GPs and/or the majority of the literature being in the ML world, where there's more of a focus in prediction than interpretation/inference.
To go back to Dave's initial question
Vague Q: What basis expansion (or kernel) is likely to be "best" for species distribution models? (according to whatever criteria you like)
Here are some (hopefully?) controversial opinions (mainly folklore anecdata based):
There's a crossover here about distance. My thought on this (heavily influenced by my thesis work) is that you do a distance transformation beforehand to deal with the scale on which you measure your covariate. In my case we were looking at "distance within complex region" but you can think about "energy distance" or other stuff. I think adapting the basis might not be the right thing to do, instead transform the data to be in the right domain first? (Haven't thought more thoroughly about this.)