Basis functions - Githubissues

dill commented 7 years ago

To go back to Dave's initial question

Vague Q: What basis expansion (or kernel) is likely to be "best" for species distribution models? (according to whatever criteria you like)

Here are some (hopefully?) controversial opinions (mainly folklore anecdata based):

For spatial smooths: use thin plate regression splines (tprs) in 2D, unless you have some species/places where you only need 1D (e.g, shoreline N-S with little N-S gradient) but then I guess always start with 2D tprs. tprs assumes that covariates are isotropic so 1 unit movement in one dimension is the same as 1 unit move in another. You could use a Gaussian process in 2D --- this would be somehow equivalent (or at least closely related, see Wahba's "Spline smoothing for observational data" section 2.5 and refs therein). If there's some fancy boundary effect, use soap film or something like that?
For other covariates, my feeling is that for 1D smooths everything is much of a muchness unless you require particular properties. E.g., if you want to have a cyclic effect (er cyclic bases?) or adapative smoothing (I can never get this to work) or you want to put a prior on extrapolation (B/P-spline thingo that Simon and I et al are working on).
Interactions etc: Building interactions using tensor products makes this simple and lets you have anisotopic effects (by default).

There's a crossover here about distance. My thought on this (heavily influenced by my thesis work) is that you do a distance transformation beforehand to deal with the scale on which you measure your covariate. In my case we were looking at "distance within complex region" but you can think about "energy distance" or other stuff. I think adapting the basis might not be the right thing to do, instead transform the data to be in the right domain first? (Haven't thought more thoroughly about this.)

davharris commented 7 years ago

Thanks, seeing this sort of folklore is really valuable.

High-dimensional interactions are a mess (curse of dimensionality, interpretation/plotting trouble, etc.), and become harder to justify. I mean, if altitude affects the temp/precip interaction (creating a 3-way interaction), do we really need to modulate that interaction according to a fourth variable? If that 4-way interaction is important, our science is screwed anyway because there's no way we'll measure it accurately.

So I idea of adding up low-dimensional smooths, as opposed compared with the radial kernels commonly used in GPs where everything is a P-way interaction. GPs can be fit the same way (adding covariances from low-dimensional kernels instead of multiplying them), but I don't think I've ever seen anyone do that outside of a tutorial.

The combinatorics of low-but-greater-than-one-dimensional smooths are also ugly. I like the idea behind factorization machines, which is basically to use the first k singular vectors of the matrix you'd get from interacting everything with everything. I don't see anything stopping someone from putting nonlinear basis functions inside of it.

goldingn commented 7 years ago

Here's one example of a someone using additive 1D GPs for interpretation: https://tspace.library.utoronto.ca/bitstream/1807/75420/1/cjfas-2016-0008.pdf (plus a high-dimensional GP to mop up other interactions.) *

I think that's the only one I've seen. I think the uncommonness of this approach is just due to the lack of familiarity with GPs and/or the majority of the literature being in the ML world, where there's more of a focus in prediction than interpretation/inference.

in looking for that, I remembered (from ISEC Seattle) this on non-stationarity, barriers, GPs and species distributions: https://arxiv.org/pdf/1608.03787.pdf

gavinsimpson / basis-sdm

Basis functions #2