danlwarren / RWTY

R We There Yet?
30 stars 17 forks source link

implement PSRF #97

Open roblanf opened 4 years ago

roblanf commented 4 years ago

For RWTY2

AKA Gelman and Rubin statistic from:

Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical science, 7(4), 457-472.

Compares variance within and between chains. Perfect score is 1 if everything is hunky-dory.

I don't see any reason why we couldn't also do this for tree topologies, based on e.g. distance from the first post-burning tree or something similar (i.e. the same as we do for the trace of tree topologies).

danlwarren commented 4 years ago

Yeah we talked about this a bit via email and it seems like a very obvious and useful idea. One issue with it is that it assumes that MCMC chains were started from an overdispersed set of starting conditions, and typically that's not the case for phylogenetic MCMC. We had a bit of a back and forth about that on phylobabble a couple of years ago:

https://www.phylobabble.org/t/generating-a-set-of-trees-that-are-as-different-from-each-other-as-possible/951/2

To summarize:

(1) Some work suggests that random starting trees are overdispersed enough to work fine for this, but other people suggested that that might be a misleading result due to tree distance metrics saturating too quickly. One possibility is to generate a ton of starting trees at random and select a maximally distant subset of them. That's still subject to the same issues with tree space metrics but arguably better than random anyway.

(2) O'Meara came up with an algorithm to select trees that are super far from each other topologically.

Anyway, providing starting trees for people seems well outside of RWTY's scope, but we should at least chuck in a disclaimer saying that Gelman-Rubin assumes chains are starting quite far from each other and any interpretation of that statistic is only as valid as that assumption.