Lectures should consider the question of "how many sightings are enough"

jjrob commented 8 years ago

This is a practical problem that comes for many people, especially with rare animals like marine mammals, where surveying is very expensive.

How many sightings are enough to fit a detection function? Buckland et al. (2001) 60-80 gets used a lot. Clarify the importance and meaning of that rule of thumb. Maybe mention your new paper.
What do you do when you don't have enough? E.g. pool multiple species, use fewer covariates or none.
How many sightings (and segments) are enough to fit a standard line transect density estimate (one with no regression model as the second stage)?
How about a GAM?
Should you manually limit the number of covariates in the GAM if you have a "small" number of sightings? Or can you throw them all in and rely on mgcv to sort it out?

This might be appropriate for the Advanced Topics section. Regardless, we definitely need to discuss it at some point.

dill commented 8 years ago

So, as you probably know, I don't really like these questions because they are hard. Some thoughts and I'll let @erex chime-in if he'd like.

Generally, these are hard questions because they are so heavily context dependent. There are no good general rules. For example, I think it's possible to fit a good detection function with 30 (maybe fewer) observations (probably similar for a GAM), but only if you have very well behaved (maybe only simulated) data; I've also seen datasets with hundreds of observations that are nearly impossible to get a good fit from.

Two (somewhat orthogonal) points that I think I'd rather talk about in the course are:

Folks should be doing pilot studies to work out more about their study area and see what's appropriate. Getting that information early and cheaply (in terms of effort if nothing else) is much better than asking the question of what to do with shitty data after it's collected.
How can we tell when things have gone wrong? Rather than thinking about "what is enough" (which will never be correct) thinking about analysing a pilot study and working out what needs to be adjusted in the "real" survey, based on those observations. Rules of thumb are effectively lazy proxies for really understanding what's going on in the data -- we'd rather folks know what's going on.

I think @erex is fond of saying "you get what you pay for", in the sense that if you do a "bad" survey (or analyse poor quality data) then you won't get a result that's defensible. It's important to stress that no amount of fancy stats will get you out of an issue with the data itself.

I'm interested to find out how many participants are collecting their own data versus analysing publicly available/previously conducted surveys (as you are).

erex commented 8 years ago

Very nicely put @dill

I would suggest that you nudge the participants gently away from the "pure" distance sampling/field methods questions. The material you gents are providing can't cover all of distance sampling plus all of "spatial modelling" in the span of 4 days.

The rule of thumb business will just lead you to bouts of heavy drinking.

@dill was intending to make reference to Distance sampling methods and applications to deal with questions of the type you suggest.

There is no law saying models cannot be fit to insufficient data; hopefully there will be sufficient diagnostics to show that inference from that type of exercise will be poor.

jjrob commented 8 years ago

I purposefully put my questions the way that novices will put them, including myself when I was first getting started (and occasionally even today). They will want to know whether there are any rules of thumb because they provide a tremendous shortcut. In my modest experience, I've found there are almost no rules of thumb, ever, with anything. Perhaps p < 0.05 is an accepted rule of thumb, but the debate about even that is interminable.

So: given that there are no rules of thumb, I suggest, time permitting, that we:

Think of the most common places where people end up searching for rules of thumb. I wrote some of them above.
Bring these up at the appropriate time in the lectures and discuss:
- Why there isn't a good rule of thumb even though they want one or have seen authors suggest one
- How they diagnose whether or not they "have a problem"
- How to proactively avoid these situations (e.g. by pilot studies)

I understand we only have four days. But if we skip over all difficulties that people typically face and teach a course in which everything is assumed to go perfectly, they may end up disappointed in the long run.

dill commented 8 years ago

Agreed. This is a good topic to cover and I wonder if we can split the "Advanced" section to cover 45 mins of MRDS/extra smoothing stuff/extrapolation techniques etc and 45 mins of "practical advice" which could include questions like this (or at least somewhere to consult). Thoughts?

Additionally, I have been putting together an extended bibliography which should also touch on examples of these issues.

dill commented 8 years ago

I've added some thoughts on this in e5e5927, under xx-practical-advice.Rpres.

@jjrob let me know what you think and if you want me to add other content.

jjrob commented 8 years ago

Some feedback on xx-practical-advice.pdf:

Slide 18: Consider referencing the following two papers. The first tested different lengths of segments and found out (possibly just for their situation!) that it didn't matter. The second one tested different spatial resolution of covariates (different cell sizes) and concluded much the same thing (but again, could be specific to their situation).

Redfern J, Barlow J, Ballance L, Gerrodette T, Becker E (2008) Absence of scale dependence in dolphin–habitat models for the eastern tropical Pacific Ocean. Marine Ecology Progress Series 363: 1–14. doi:10.3354/meps07495

Becker E, Forney K, Ferguson M, Foley D, Smith R, Barlow J, et al. (2010) Comparing California Current cetacean–habitat models developed using in situ and remotely sensed sea surface temperature data. Marine Ecology Progress Series 413: 163–183. doi:10.3354/meps08696

I'm submitting this comment now for your consideration but leaving this assigned to myself so I can review it more later and possibly offer additional feedback.

dill commented 8 years ago

I looked through these two papers previously (they are in the bibliography, I was going to mention them while I was talking) -- I think that it's likely that these issues are domain/species dependent (or at least we don't have enough evidence at this point to believe that they aren't). The papers don't really give me enough information about these scale issues to understand what's happening in terms of comparing the resolution of covariates or segments. I think both also constrain the basis complexity, so that may also complicate things...

I wonder how much marine vs. terrestrial makes a difference. Given the dynamic nature of, say SST, perhaps there is so much going on that resolution change actually makes little difference in the model when compared to everything else?

dill commented 8 years ago

Okay, this should now be folded into the practical advice lecture.

DistanceDevelopment / spatial-workshops

Lectures should consider the question of "how many sightings are enough" #21