alek0991 / iSAFE

Pinpoints the mutation favored by selection
BSD 3-Clause "New" or "Revised" License
32 stars 11 forks source link
evolution favored-mutation natural-selection pinpoint population-genetics selective-sweeps

iSAFE: integrated Selection of Allele Favored by Evolution

release Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge

Most approaches that capture signatures of selective sweeps in population genomics data do not identify the specific mutation favored by selection. The iSAFE enables researchers to accurately pinpoint the favored mutation in a large region (∼5 Mbp) by using a statistic derived solely from population genetics signals. iSAFE does not require knowledge of demography, the phenotype under selection, or functional annotations of mutations.

FAQ – frequently asked questions

Please read the FAQ for answers to the most common queries.

Conda Installation (recommended)

On 64-bit Linux and Mac OSX, you can install iSAFE from bioconda channel using conda package manager. iSAFE v1.1.0 and later is compatible with both python 2 and python 3.

  1. Install Miniconda (you can skip this if you already have either of Miniconda or Anaconda installed).
  2. Add the bioconda channel by running the following commands in your terminal (order matters):
    conda config --add channels defaults
    conda config --add channels bioconda
    conda config --add channels conda-forge
  3. Run the following command to install iSAFE (and all dependencies)
    conda install isafe

Manual installation: Software requirements ==========

Alternatively, and for all other OS and architectures, you can download the github repository and install isafe using the setup script.

  1. Following python packages are required:

    • numpy version 1.9 or above
    • pandas version 0.18 or above
  2. Install bcftools version 1.2 or above (only for --format vcf, not required if you are using --format hap).

    • Follow the bcftools installation guideline.
    • iSAFE assumes the bcftools binary file is installed to a bin subdirectory that is added to your $PATH; otherwise, you have to change the following line in ./isafe/bcftools.py to the bcftools binary file path:
      bcf_tools = "bcftools"
  3. Clone the github repository by running (or you can download the repo)

    git clone https://github.com/alek0991/iSAFE.git
  4. Change to the iSAFE directory and run the pip installation

    cd iSAFE
    pip install .

Execution:

Use the following command to see all the available options in iSAFE.

isafe --help

These detailed instructions are also provided in ./help.txt.

Note: If you have a script for iSAFE<v1.1.0 and want to update to the latest version you should change the commands from

python ./src/isafe.py [Options]

to

isafe [Options]

Input:

Consider a sample of phased haplotypes in a genomic region. We assume that all sites are biallelic and polymorphic in the sample. Thus, our input is in the form of a binary SNP matrix with each column corresponding to a haplotype and each row to a mutation, and entries corresponding to the allelic state, with 0 denoting the ancestral allele, and 1 denoting the derived allele. Not surprisingly, iSAFE performance deteriorated when the favored mutation is fixed or near fixation (favored allele frequency (ν) > 0.9 in Supplementary Fig. 3e). To handle this special case, we included individuals from non-target populations, using a specific protocol (See online Methods, section Adding outgroup samples). iSAFE can take input in hap or vcf formats.

Note: The software in vcf mode is more flexible and has more options. But if you don't have out-group samples (--vcf-cont is not set), and hap file and vcf file contain the exact same information (iSAFE only cares about position, haplotype phase, derived allele (1), and ancestral allele (0) in the vcf mode), then the output of iSAFE must be identical.

Output:

The output is a non-negative iSAFE-score for each mutation, according to its likelihood of being the favored variant of the selective sweep. Result (<output>.iSAFE.out) is a TAB separated file in the following format.

POS iSAFE DAF
291 0.02 0.05
626 0.01 0.55
... ... ...

With following headers:

Data availability for vcf format

Demo 1: input in hap format

With --format hap, iSAFE assumes that derived allele is 1 and ancestral allele is 0 in the input file, and the selection is ongoing (the favored mutation is not fixed).

isafe --input ./example/hap/demo.hap --output ./example/hap/demo --format hap

Demo 2: input in vcf format

Follow the instructions in the Data Requirements section and download Homo-Sapiens Ancestral Allele files and phased vcf files of Chromosome 2 of 1000GP populations (GRCh37/hg19), and replace the text in each < > with the proper file path.

Scenario 1: All samples

All the samples of the --input vcf file as the case population:

isafe --input <chr2 vcf file> --output ./example/vcf/LCT --region 2:134108646-139108646 --AA <chr2 Ancestral Allele file>

Scenario 2: A subset of samples

A subset of samples (--sample-case) of the --input vcf file as the case population:

isafe --input <chr2 vcf file> --output ./example/vcf/LCT --region 2:134108646-139108646 --AA <chr2 Ancestral Allele file> --sample-case ./example/vcf/case.sample

Scenario 3: Adding outgroup samples

A subset of samples (--sample-case) of the --input vcf file as the case population and a subset of samples (--sample-cont) of the --vcf-cont vcf file as the control population:

isafe --input <chr2 vcf file> --output ./example/vcf/LCT --region 2:134108646-139108646 --AA <chr2 Ancestral Allele file> --vcf-cont <chr2 vcf file> --sample-case ./example/vcf/case.sample --sample-cont ./example/vcf/cont.sample