gavinsimpson / basis-sdm

0 stars 0 forks source link

"distance" from environmental covariates #1

Open dill opened 7 years ago

dill commented 7 years ago

So here are some rough thoughts about distances etc based on Dave Harris's tweet:

(e.g., are 20℃ and 25℃ the same distance as -3℃ and +2℃? Are they still the same if humidity changes?)

One thing that's nice about using some kind of spline-based modelling method (let's call them "GAMs" for now) is that you don't have to make this assumption. This is Good, but the issue is that you can only inform this if you have data, of course (the good news being that you're uncertainty will also increase in this situation).

What's potentially interesting (to me at least) is thinking about whether effects of point estimates of covariates matter. So, conretely, if you see an animal where the sea surface temperature is 10ºC. Is the animal really intrested in that sea surface temperature ("mmm 10ºC is juuuuust riiiight") or is it really the surrounding temperatures in space or perhaps that there were more interesting temperatures a few days ago and they stuck around (or were attracted by social cues from other individuals etc).

For this kind of thing one requires a functional data approach (I think) but this is possible within a GAM framework anyway (see e.g., Phil Reiss's papers)... I think these are interesting modelling possibilities.

goldingn commented 7 years ago

@davharris, in your tweet were you referring to non-linearity or non-stationarity?

I.e. if the issue is non-linearity (faster drop-off in suitability at cold temps than in warm) then the flexible nature of the spline/GP does cover that.

I assumed you meant non-stationarity (that the amount of flexibility is higher close to 0 than at higher temperatures). In a GP context, that would be equivalent to the lengthscale parameter also varying with temperature. If I understand splines correctly, the GAMs do make an assumption that the complexity is the same all over, right @dill @gavinsimpson?

gavinsimpson commented 7 years ago

@goldingn IIUC, no GAMs don't assume that. However, most bases do, as they have a single smoothness parameter. One such basis in mgcv that doesn't make this assumption is the adaptive spline basis bs = "ad", which allows the wiggliness of the spline to vary over the range of the covariate. I don't recall how flexible this varying wiggliness is however.

goldingn commented 7 years ago

right, that makes sense!

davharris commented 7 years ago

@gavinsimpson and @dill: This is great, thanks for setting it up and starting such a good conversation, respectively

@dill: I think there are all sorts of discrepancies between the variables we measure and the ones the organisms care about. Distance in time is a big one, especially when the variables are sampled infrequently or averaged. Ditto for space. All we ever have are proxies, sadly.

@goldingn I actually wasn't thinking very precisely about how this would work in the kernel context (as opposed to the basis function context) and you did a great job of bringing clarity to these questions. I don't have anything to add. Nicely done.

dill commented 7 years ago

@goldingn: ~" GAMs do make an assumption that the complexity is the same all over, right?"

Well, that's a tricky question in the sense that it depends what you mean by "complexity"... We can think of complexity as:

  1. max number of basis functions for a given term/function (the "k" value) -- maybe call this "maxmim complexity"?
  2. amount of wigglyness as dictated by the smoothing parameter -- maybe call this "estimated degrees of freedom"?

That said, some quick thoughts

  1. you can allocate knots such that you give more flexibility in a given area -- not ideal since this is probably pretty subjective (but maybe you have a reasonable prior on some part of the covariate range, like the animal can't exist in water, or above some temperature it fails to thermoregulate and dies etc). 1b. tprs does this "optimal" knot placement thingo by eigendecomposition thing (see Simon's tprs paper or the pseudo-splines paper by Hastie (iirc))
  2. the wigglyness is dictated by the smoothing parameter and that is the same for the whole function but that doesn't mean that the whole function has to be wiggles, esp. by 1 and 1b above you can have things that look linear to begin with and then get wiggly at the end, for example. n. adaptive splines (but I can never get them to work anyway)

(I'll think about the other things like stationarity :plate_with_cutlery:/🏃)

goldingn commented 7 years ago

Cool, that's helpful.

I was more thinking definition 2 (EDF ~~ lengthscales). Though I still haven't quite got my head around the GP/splines link; from a GP perspective, the GAM's knots control both the wiggliness and the approximation quality at the same time. There's also GP literature I'm not familiar with on optimally placing inducing points (essentially knots) a priori, but learning the inducing point locations at the same time as fitting the model seems to work slightly better. James H has a nice paper where they move around and all line up on a classification boundary.

BTW, I'm (quite possibly incorrectly) defining stationarity here as when the hyperparameters of the GP/GAM (marginal variance & lengthscale/EDF) are constant along the covariate. Lengthscale stationarity being possibly more interesting in this context than variance stationarity.

dill commented 7 years ago

@goldingn I haven't got my head around the modern GP<->GAM link, partly because I learnt kriging and it seems like the terminology is a wee bit different in places and that's irritating (not 100% clear that when you say stationarity you mean the same thing as I know it as, for example).

Anyhow on the topic of knots... There are a couple of things going on here when we say "control both the wiggliness and the approximation quality":

  1. basis "size" -- the number of basis functions that you throw into the model
  2. knots -- number of "control points" (# knots =not neccessarily= # basis functions)
  3. smoothing parameter(s) -- control wigglyness by controling the influence of the penalty during fitting (if you're doing that...)

Now you can do smoothing in a GAM context without doing the penalised stuff (i.e., fixed/no smoothing parameter) and just move knots around optimizing some fit criteria (IIRC this is what SALSA does, though I think this is complicated computationally in >1D).

In penalized-land (where I live), the idea is make the number of basis functions relatively large (larger than you need), then penalize back to a parsimonious fit (e.g., either directly optimizing REML/ML or doing some kind of outer iteration stuff; exactly how this fits into the mgcv timeline is beyond my memory at the moment but the former comes from the GLMM interpretation of the GAM, the latter thinking about the model as a penalized GLM). Okay, so, what about knots in penalized-land? (Ideology warning) Knot selection problems are boring and time consuming (though they may make for fancy papers with cute algorithms), better (ideology*) to either equally space (e.g., w/ P/B-splines) or do the optimal placement thing based on the data once when you set up your design matrix (the eigendecomposition trick from Wood, 2003). Either way, just make the number of knots too big then use the penalty to get back to the fit you need.

Now, back to your initial assertion about knots: yes knots kind of do control wigglyness and approximation quality but in my mind they control the maximum wigglyness and approximation quality. It's the smoothing parameter that's really in control when it comes to the wigglyness.

The knots are (usually) fixed going into a penalized regression (GAM) fitting algorithm -- they are a part of the model encoded in the design matrix. During fitting the smoothing parameter and the basis function coefficients are estimated (and er, distribution-specific stuff I guess).

Hopefully this is clearer?

goldingn commented 7 years ago

Ah cool, that's really helpful for me to get my head around!

If you have time, it would be great to sit down somewhere in ☔️ Hobart/Melbourne🌞 with a copy of Wahba :book:, several :beers: and/or :coffee: and go through the nuts and bolts of GAMs == GPs?

(James' paper, FYI)

dill commented 7 years ago

Yes let's definitely ☕️ then 🍻 and go through that. I've just sorted out tickets but not my schedule while I'm around. I'll drop you an e-mail on some scheduling once I have a better idea of what's going on.

In the meantime I'll take a look at James' paper and see if I can update my gp lingo...

On 03/05/2017 23:54, Nick Golding wrote:

Ah cool, that's really helpful for me to get my head around!

If you have time, it would be great to sit down somewhere in ☔️ Hobart/Melbourne🌞 with a copy of Wahba 📖, several 🍻 and/or ☕️ and go through the nuts and bolts of GAMs == GPs?

(James' paper, FYI http://proceedings.mlr.press/v38/hensman15.pdf)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gavinsimpson/basis-sdm/issues/1#issuecomment-299091720, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAHoTeYAdVjYQWJvp_qq7a_SyIbhIYfks5r2Uv_gaJpZM4NNd5w.

goldingn commented 7 years ago

👍 👍 👍