GotelliLab / EcoSimR

Repository for EcoSimR, by Gotelli, N.J. , Hart E. M. and A.M. Ellison. 2014. EcoSimR 0.1.0
http://ecosimr.org
Other
26 stars 10 forks source link

Sim 9 Fast #11

Closed emhart closed 10 years ago

emhart commented 10 years ago

Sim 9 Fast is a bit tricky because it doesn't fit in with the whole framework of the rest of the algorithms. I need to work out how it all fits in the new object for co-occurrence.

ngotelli commented 10 years ago

Yep, this was exactly the problem I ran into. The simplest thing would be to kick out an error message if NullModelEngine is used with Sim9 or Sim9 Fast and redirect the user to invoke some other function Sim9NullModelEngine. Note that both Sim 9 and Sim 9 Fast can be used with NullModelEngine (we have in the past called this version an "independent swap"), but each replicate then is constructed independently and uses thousands of randomizations, so it really crawls. Once the burn-in period is passed, Sim9 behaves well using consecutive swaps, which is much faster (although still pretty slow in R).

Let me know if you need any more details. I will be in France for about

2 weeks, but should have intermittent e-mail contact.

Quoting Edmund Hart <notifications@github.com>:

Sim 9 Fast is a bit tricky because it doesn't fit in with the whole
framework of the rest of the algorithms. I need to work out how it
all fits in the new object for co-occurrence.

— Reply to this email directly or view it on GitHub[1].

 


Nicholas J. Gotelli       Office Phone: 802-656-0450
Department of Biology     Lab Phone: 802-656-0451   
University of Vermont     Fax: 802-656-2914
Burlington, VT 05405      e-mail: ngotelli@uvm.edu
********************************************************
Home Page (with manuscript pdfs):

http://www.uvm.edu/~ngotelli/homepage.html

Musician's Corner (with mp3s):

http://www.uvm.edu/~ngotelli/musicpage/music.html

NEW: EcoSimR (free software for null model analysis):

http://www.uvm.edu/~ngotelli/EcoSim/EcoSim.html


Links:

[1] https://github.com/GotelliLab/EcoSimR/issues/11

emhart commented 10 years ago

I've got this mostly worked out. However I there's 3 fxn labeled as "sim9" sim9.single, sim9.fast and sim9. Clearly the first two are related, but does sim9 do something different than sim9 fast?

ngotelli commented 10 years ago

Hi Ted:

I am writing to you from Paris, where Maryanne and I are taking a few

days of vacation after a challenging statistics conference in Montpellier.

To answer your question, sim9 is the "original" algorithm for

randomizing a binary matrix while maintaining row and column totals. It grabs two random rows and two random columns. If the 4 cells are arrayed in a checkerboard, it swaps those elements, which maintains the row and column sums of the entire matrix. If not, it tries again with two new random rows and columns. Unfortunately, the algorithm is relatively slow because it takes many burn-in swaps before it reaches a stationary distribution. And, because it throws out those passes that cannot be swapped, it is slightly biased in the set of matrices it does produce.

sim9.fast is a new and improved version from the physics literature

that was sent to me by a computer scientist. It grabs two random rows and creates a submatrix from them. It then finds the columns in the sub-matrix that have a column total of 1. Those columns are then reshuffled within the submatrix, although the columns with total of 0 or 2 stay fixed in the submatrix. After the submatrix is reshuffled, the two rows are replaced in the original matrix, which preserves the row and columns sums. Another two random rows are selected and the process is repeated many times. This algorithm is much faster because each iteration creates numerous swaps rather than just 4 cells. Also, none of the iterations are thrown out, even if the unusual situation arises in which the chosen submatrix has column sums that are all 0 or 2 (and hence no swaps are permissible). This feature eliminates the slight bias in sim9 that causes those matrices to be non-random.

So, I would like to have sim9 available for "historical" purposes, but

sim9.fast is the one that should be the default. Both sim9 and sim9.fast cannot be used with the standard Null Model Engine function because they are Monte Carlo Markov Chains, in which each matrix depends on the one before it, and the set of matrices (after sufficient burn-in) are retained . In contrast, the Null Model Engine function assumes that each creation of a null matrix is independent of the previous null matrices.

So I could not use sim9 or sim9.fast with Null Model Engine and had to

set up some special calls. I will have to look, but I think that sim9.single was something I tried so I could sue it with Null Model Engine.

If it is faster to talk, we can skype about this next week. Contrary to

my e-mail vacation message, Maryanne and I will return on Thursday night, so let me know if you want to chat.

Thanks again!

Nick

Quoting Edmund Hart <notifications@github.com>:

I've got this mostly worked out. However I there's 3 fxn labeled as
"sim9" sim9.single, sim9.fast and sim9. Clearly the first two are
related, but does sim9 do something different than sim9 fast?

— Reply to this email directly or view it on GitHub[1].

 


Nicholas J. Gotelli       Office Phone: 802-656-0450
Department of Biology     Lab Phone: 802-656-0451   
University of Vermont     Fax: 802-656-2914
Burlington, VT 05405      e-mail: ngotelli@uvm.edu
********************************************************
Home Page (with manuscript pdfs):

http://www.uvm.edu/~ngotelli/homepage.html

Musician's Corner (with mp3s):

http://www.uvm.edu/~ngotelli/musicpage/music.html

NEW: EcoSimR (free software for null model analysis):

http://www.uvm.edu/~ngotelli/EcoSim/EcoSim.html


Links:

[1] https://github.com/GotelliLab/EcoSimR/issues/11#issuecomment-48274016

emhart commented 10 years ago

Hi Nick.

From your code it seems that sim9.single is called within sim9.fast. Right now I've actually created another engine that is called when simFast (as I call it) is chosen. It adds some extras to the underlying object that can be used in plotting. I think I'm pretty close to finishing up a fully functional beta (or maybe alpha) version of the package. I'll write up a brief vignette when I'm done that you can try running, but you shouldn't trouble yourself while on vacation.

ngotelli commented 10 years ago

Hi Ted:

I hadn't peeked at the code, but sim9.fast must be setting up the MCMC

and sim9.single is just one iteration of it. I will look forward to seeing the next configuration.

Thanks again for your help,

Nick

Quoting Edmund Hart <notifications@github.com>:

Hi Nick.

From your code it seems that sim9.single is called within sim9.fast.
Right now I've actually created another engine that is called when
simFast (as I call it) is chosen. It adds some extras to the
underlying object that can be used in plotting. I think I'm pretty
close to finishing up a fully functional beta (or maybe alpha)
version of the package. I'll write up a brief vignette when I'm done
that you can try running, but you shouldn't trouble yourself while
on vacation.

— Reply to this email directly or view it on GitHub[1].

 


Nicholas J. Gotelli       Office Phone: 802-656-0450
Department of Biology     Lab Phone: 802-656-0451   
University of Vermont     Fax: 802-656-2914
Burlington, VT 05405      e-mail: ngotelli@uvm.edu
********************************************************
Home Page (with manuscript pdfs):

http://www.uvm.edu/~ngotelli/homepage.html

Musician's Corner (with mp3s):

http://www.uvm.edu/~ngotelli/musicpage/music.html

NEW: EcoSimR (free software for null model analysis):

http://www.uvm.edu/~ngotelli/EcoSim/EcoSim.html


Links:

[1] https://github.com/GotelliLab/EcoSimR/issues/11#issuecomment-48349794

emhart commented 10 years ago

This is fully integrated, closing.