Co-occurrence null model - beginner issue

GotelliLab / EcoSimR

Repository for EcoSimR, by Gotelli, N.J. , Hart E. M. and A.M. Ellison. 2014. EcoSimR 0.1.0

http://ecosimr.org

Other

27 stars 10 forks source link

Co-occurrence null model - beginner issue #75

Open cegboy opened 6 years ago

cegboy commented 6 years ago

Hi, I am a complete beginner with R (as I've mentioned to Nick Gotelli already via email) and struggling to run the Co-occurrence null model. Seems I managed to run the model but would appreciate if someone could check it what I am doing.

I have used the following to upload file and run:

Run the null model

test <- cooc_null_model(read.csv(choose.files()), algo="sim9",nReps=10000,burn_in = 500)

Summary and plot info

summary(test) plot(test,type="burn_in") plot(test,type="hist") plot(test,type="cooc")

The file I used is attached as well as plots. I got this summary: Time Stamp: Wed Aug 22 21:10:11 2018 Reproducible:
Number of Replications:
Elapsed Time: 0.79 secs Metric: c_score Algorithm: sim9 Observed Index: 2.0895 Mean Of Simulated Index: 1.9061 Variance Of Simulated Index: 0.0033681 Lower 95% (1-tail): 1.8263 Upper 95% (1-tail): 2.0105 Lower 95% (2-tail): 1.8158 Upper 95% (2-tail): 2.0474 Lower-tail P = 0.994 Upper-tail P = 0.0079 Observed metric > 9921 simulated metrics Observed metric < 60 simulated metrics Observed metric = 19 simulated metrics Standardized Effect Size (SES): 3.1605

Does it seem alright? Also, is there a preference for csv over txt? test iteration graph test hist test marcelo cooc test Marcelo.txt

Many thanks Marcelo

cegboy commented 6 years ago

Hi, further to the previous post I adventured running the full 247 samples x 452 spp and got these results. Time Stamp: Wed Aug 22 21:25:09 2018 Reproducible:
Number of Replications:
Elapsed Time: 2.5 mins Metric: c_score Algorithm: sim9 Observed Index: 475.34 Mean Of Simulated Index: 403.97 Variance Of Simulated Index: 28.796 Lower 95% (1-tail): 401.41 Upper 95% (1-tail): 416.23 Lower 95% (2-tail): 401.35 Upper 95% (2-tail): 422.74 Lower-tail P > 0.9999 Upper-tail P < 1e-04 Observed metric > 10000 simulated metrics Observed metric < 0 simulated metrics Observed metric = 0 simulated metrics Standardized Effect Size (SES): 13.299 full data iteration graph full data hist full data cooc graph

Any thoughts? Cheers Marcelo

cegboy commented 6 years ago

Hi all, any thoughts on this? Especially the black squares? cheers Marcelo

emhart commented 6 years ago

Hi @cegboy

I took a look at your example and ran it, and it all looks good. One comment is that your file is tab separated, not comma (this was an easy fix, maybe it was uploading). Can you share the other file you're using? It's hard to dig into it without seeing the data, but my first (and possibly wrong) guess is that you're not using long enough burn in with the bigger data set. But I'm happy to check it out for you.

cegboy commented 6 years ago

Hi Hart, thanks. Yes, I have since changed to .csv. Here is the big test data I am using. I did try to increase the burn out to 1000 I think. Thank you very much for your help. Marcelo (BTW Test data EcoSim R.zip

"cegboy" used to be a call sign I used in Atari..)

cegboy commented 6 years ago

Hi Edmund @emhart , I run same full 247 samples x 452 spp dataset, this time with no spp or site names, just 1,2,3 etc for spps and site1, site2 etc for sites. Used nReps=10000 and burn_in= 100000. The black squares was more a question of resolution to fix. I got these results now. Attached plots. Does this look better? Also, is there any literature I can use besides R documentation to understand and interpret results? Thanks Marcelo full 247 samples x 452 spp no names hist

Time Stamp: Fri Aug 24 20:27:13 2018 Reproducible:
Number of Replications:
Elapsed Time: 19 mins Metric: c_score Algorithm: sim9 Observed Index: 4623706512 Mean Of Simulated Index: 4623705736 Variance Of Simulated Index: 851660 Lower 95% (1-tail): 4623704208 Upper 95% (1-tail): 4623707234 Lower 95% (2-tail): 4623704020 Upper 95% (2-tail): 4623707560 Lower-tail P = 0.7944 Upper-tail P = 0.2056 Observed metric > 7944 simulated metrics Observed metric < 2056 simulated metrics Observed metric = 0 simulated metrics Standardized Effect Size (SES): 0.84054

cegboy commented 6 years ago

Now same dataset and parameters but with names of species and sites Time Stamp: Fri Aug 24 21:10:17 2018

Reproducible:
Number of Replications:
Elapsed Time: 20 mins Metric: c_score Algorithm: sim9 Observed Index: 475.34 Mean Of Simulated Index: 401.76 Variance Of Simulated Index: 0.064255 Lower 95% (1-tail): 401.32 Upper 95% (1-tail): 402.17 Lower 95% (2-tail): 401.26 Upper 95% (2-tail): 402.29 Lower-tail P > 0.9999 Upper-tail P < 1e-04 Observed metric > 10000 simulated metrics Observed metric < 0 simulated metrics Observed metric = 0 simulated metrics Standardized Effect Size (SES): 290.26 full 247 samples x 452 spp with names

cegboy commented 6 years ago

Hi @emhart Edmond and @ngotelli Nick, I run the model with variations of nRep and Burn-in using the full data set with site and species names with underscores between names e.g. Alouatta_belzebul. Some observations:

C-scores for all 4 simulations were the same as you can see below.
Using the default Burn-in reached 50%. The other three reached 100%.
Only SES suffered noticeable differences.
Histograms graphs also where less visually enticing when using higher Burn-In due to scale.
Trace graphs also varied visually due to scale.
Simulated and observed graphs look mostly the same and the original issue of the black squares was solved by increasing resolution. I am tending to use the nRep=1000 and Burn-In = 1000 to report results as all C-scores are the same, the graph scales look visually more informative. The major variation among simulations was the SES value. How important would that be for analysis of results? Attached are the graphs as well. Thanks for all your help. Cheers Marcelo ![Uploading nReps 1000 Burn-in 500 sim ori.jpeg…]() ![Uploading nReps 1000 Burn-in 500 Hist.jpeg…]() ![Uploading nReps 1000 Burn- in 500 Trace.jpeg…]()

cegboy commented 6 years ago

nreps 1000 burn-in 500 sim ori nreps 1000 burn-in 500 hist nreps 1000 burn- in 500 trace

ngotelli commented 6 years ago

Dear Marcelo (and Ted):

Sorry to be out of touch, but we at the end of the field season and the start of the academic semester, which is a very busy time of year. The analyses you have conducted look "correct", although as Ted noted, you need to increase the number of burn-in replications until the curve begins to flatten out. This will not change your results because, for such a large matrix, the observed is always very distant from the null. The graphics will show up black when there are too many species or sites to display, but it looks like you have solved that problem.

The bigger issue is that you are exploring a paradigm with methods that are now over 20 years old, and the field of null model analysis has changed a lot during that time period. More emphasis now is put on identifying pairs of non-random species, and on combining other data on spatial location and habitat variables to tease apart the mechanisms for species non-random associations, which can include species interactions, habitat niches, and dispersal limitation. I appreciate that not all of these hypotheses may be easy to address with microbe data.

I have attached some of my own papers in this literature, although there are now many other approaches to consider as well. I hope this helps guide you in your future analyses.

Best wishes,

Nick

From: cegboy notifications@github.com Sent: Sunday, August 26, 2018 5:36 AM To: GotelliLab/EcoSimR Cc: Nicholas Gotelli; Mention Subject: Re: [GotelliLab/EcoSimR] Co-occurrence null model - beginner issue (#75)

[nreps 1000 burn-in 500 sim ori]https://user-images.githubusercontent.com/20757936/44626846-e66e4e80-a91b-11e8-9615-5b7d115fbfa9.jpeg [nreps 1000 burn-in 500 hist]https://user-images.githubusercontent.com/20757936/44626847-e66e4e80-a91b-11e8-8564-75147740b89f.jpeg [nreps 1000 burn- in 500 trace]https://user-images.githubusercontent.com/20757936/44626848-e66e4e80-a91b-11e8-8452-5e82d3e32685.jpeg

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/GotelliLab/EcoSimR/issues/75#issuecomment-416025999, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEso-oDelEREJUxay4Lmj5FpmlCtHPqTks5uUmwzgaJpZM4WIWCK.

cegboy commented 6 years ago

Dear Nick @ngotelli , thank you very much for your insights. In my case I was hoping to use the C-Scores to compare with different types of protected areas categories, testing if there would be any significant difference between the species community being conserved. You mention that with such big datasets the observed will allows be very distant from the null model. So in this case, would you say such comparison, using Co-occurrence analysis would be useless? I do plan to add some habitat variable, possibly deforestation as well, but those are future plans. Any thoughts on the usefulness of Co-occurrence analysis on this case or any other analysis with presence absence data you would suggest? Can you attach your papers again please? Many thanks Marcelo

ngotelli commented 6 years ago

Dear Marcelo @cegboyhttp://@cegboy , It is really hard for me to say how the comparisons will change for different subgroups. Here are the papers again. They may give you ideas for further analyses.

Best wishes,

Nick

From: cegboy notifications@github.com Sent: Sunday, August 26, 2018 3:20 PM To: GotelliLab/EcoSimR Cc: Nicholas Gotelli; Mention Subject: Re: [GotelliLab/EcoSimR] Co-occurrence null model - beginner issue (#75)

Dear Nick @ngotellihttps://github.com/ngotelli , thank you very much for your insights. In my case I was hoping to use the C-Scores to compare with different types of protected areas categories, testing if there would be any significant difference between the species community being conserved. You mention that with such big datasets the observed will allows be very distant from the null model. So in this case, would you say such comparison, using Co-occurrence analysis would be useless? I do plan to add some habitat variable, possibly deforestation as well, but those are future plans. Any thoughts on the usefulness of Co-occurrence analysis on this case or any other analysis with presence absence data you would suggest? Can you attach your papers again please? Many thanks Marcelo

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/GotelliLab/EcoSimR/issues/75#issuecomment-416062571, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEso-gX_a_eLPyvsfmOxLzVj077_PzdOks5uUvTygaJpZM4WIWCK.

cegboy commented 6 years ago

Thanks Nick. Will do further tests. The papers didn't attach but no worries as I have now downloaded some from your site. Cheers Marcelo

On Sun, 26 Aug 2018 at 20:47, Nick Gotelli notifications@github.com wrote:

Dear Marcelo @cegboyhttp://@cegboy , It is really hard for me to say how the comparisons will change for different subgroups. Here are the papers again. They may give you ideas for further analyses.

Best wishes,

Nick

From: cegboy notifications@github.com Sent: Sunday, August 26, 2018 3:20 PM To: GotelliLab/EcoSimR Cc: Nicholas Gotelli; Mention Subject: Re: [GotelliLab/EcoSimR] Co-occurrence null model - beginner issue (#75)

Dear Nick @ngotellihttps://github.com/ngotelli , thank you very much for your insights. In my case I was hoping to use the C-Scores to compare with different types of protected areas categories, testing if there would be any significant difference between the species community being conserved. You mention that with such big datasets the observed will allows be very distant from the null model. So in this case, would you say such comparison, using Co-occurrence analysis would be useless? I do plan to add some habitat variable, possibly deforestation as well, but those are future plans. Any thoughts on the usefulness of Co-occurrence analysis on this case or any other analysis with presence absence data you would suggest? Can you attach your papers again please? Many thanks Marcelo

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub< https://github.com/GotelliLab/EcoSimR/issues/75#issuecomment-416062571>, or mute the thread< https://github.com/notifications/unsubscribe-auth/AEso-gX_a_eLPyvsfmOxLzVj077_PzdOks5uUvTygaJpZM4WIWCK

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GotelliLab/EcoSimR/issues/75#issuecomment-416065296, or mute the thread https://github.com/notifications/unsubscribe-auth/ATy9sFXk_8oxWB5tRthecZanlgDH6Aj0ks5uUvtfgaJpZM4WIWCK .

-- Dr Marcelo Gonçalves de Lima

Research Fellow - Center for Large Landscape Conservation Cambridge Conservation Forum - Connectivity Conservation Work Group Chair

IUCN - WCPA member/Connectivity Conservation Specialist Group - Brazil Lead IUCN - CEM member ARPA - Amazon Region Protected Areas Programme Scientific Advisor Biologist, PhD in Ecology https://uk.linkedin.com/in/marcelo-lima-35b3ba20