defleury / Schmidt_et_al_2016_community_similarity

Analysis code for a manuscript on community similarity computation
14 stars 7 forks source link
bioinformatics biology ecology networks

Analysis code: A Family of Interaction-Adjusted Indices of Community Similarity

This repository contains the R code for the analyses conducted in the study of Schmidt et al., "A Family of Interaction-Adjusted Indices of Community Similarity". The manuscript is available as a preprint on biorXiv, through PubMed Central and open access via the ISME Journal. The full reference is:

Schmidt TSB, Matias Rodrigues FM & von Mering C, A family of interaction-adjusted indices of community similarity, ISME J (2017) 11:791-807, doi:10.1038/ismej.2016.139

Study abstract:

"Interactions between taxa are essential drivers of ecological community structure and dynamics, but they are not taken into account by traditional indices of diversity. In this study, we propose a novel family of indices that quantify community similarity in the context of taxa interaction networks. Using publicly available datasets, we assess the performance of two specific indices which are Taxa INteraction-Adjusted (TINA, based on taxa co-occurrence networks), and Phylogenetic INteraction-Adjusted (PINA, based on phylogenetic similarities). TINA and PINA outperformed traditional indices when partitioning human-associated microbial communities according to habitat, even for extremely downsampled datasets, and when organising ocean micro-eukaryotic plankton diversity according to geographical and physicochemical gradients. We argue that interaction-adjusted indices capture novel aspects of diversity outside the scope of traditional approaches, highlighting the biological significance of ecological association networks in the interpretation of community similarity."

Data availability

The study was conducted based on re-proceesed publicly available data from the Human Microbiome Project and TARA Oceans. Moreover, this repository contains an additional analysis script on data from the global Reef Life Survey. Re-processed data is available via the website meringlab.org and in the folder data in this repository. In particular, these datasets were used:

Analysis code

The code is organised to re-generate analysis underlying the various figures in the publication. Prior to running any analysis, edit any script to insert the correct paths to data and results folders etc. Moreover, you will need to source the script functions.community_similarity.R (done automatically at the beginning of each script). The community similarity matrices on which most analyses rely are generated in the script prepare.community_similarity.R.

The code was deposited as-is, so there is no guarantee that scripts will run on any system. In particular, some computations (e.g., SparCC correlation networks) require significant computational resources; parallelization was tested on an Ubuntu system only (it will probably fail on Windows machines). We are currently working to provide more efficient and versatile versions of the TINA/PINA code and will hopefully be able to do so in the near future.

Contact: Sebastian Schmidt (sebastian.schmidt [at] embl.de)