Open Pakillo opened 10 years ago
These are great comments. Sorry it took me a while to get to them.
Hi guys, I can't believe it took me almost a month to to get to this - I'm really sorry.
First, Paco - thanks so much for sharing your thoughts!
I mostly agree. In my area the absolute probability of presence does seem to be useful for a number of things, not least for determining the thresholds and decision criteria which public health policy makers often want. That said, I don't really know how useful it could be in broader SDM applications. I wonder what proportion of the 54% of MAXENT users who interpreted the output as probability of presence in that Yackulic study went on to use it in that respect or just reported it incorrectly? Possibly something we could quantify.
a) I agree that point processes are probably going to be the best likelihoods for many applications of POSDM. However they don't get (automatically) get around the observation bias issue, which seems to me to be the single biggest problem with current POSDM practice. I.e. a point process which treats all non-presences as equal will be just as bad in that respect as a naive logistic regression-type model with random placement of background points. Thinned point processes seem the way to go, though the thinning is as subjective as pseudo-absence placement - so it isn't a panacea.
b) I think the discussion over which type of data is best isn't particularly helpful; I've never been in the situation where there was any no choice in the matter. I'm actually working on some disease mapping at the moment where I have data from actively-collected disease prevalence (i.e. planned surveys but only in areas where the disease is suspected, analagous to presence/absence) and the locations of passively-reported cases (i.e. subject to variable reporting rates but everywhere, analagous to presence-only data). Neither is particularly useful alone, but modelling them jointly means I can quantify and model those variable reporting rates at the same time as the disease rates - which is really useful in this situation. Obviously doing active surveys in wider areas would be optimal, but like I say, I have no say in that.
1 - ppois(1, lambda))
. This would be a good estimate in a situation where you observe all (or almost all) the occurrences (e.g. your training data is the locations of trees, where you have a complete survey as your training data) but if you only observe only a fraction of the individuals then you don't stand a chance at estimating the true, absolute probability of presence. You would need a very good model of detection probability to make up for it and that would require additional data.
Hi Nick and Dave,
I see the proto-ms has evolved a lot in the last week and is almost ready for submission ;-). It's also becoming quite technical and I can't contribute much. But, as promised, I'll leave some comments just in case they are useful.
Nothing else by now... Hope this helps somehow, and good luck with the ms!
Cheers,
Paco