Most approaches that capture signatures of selective sweeps in population genomics data do not identify the specific mutation favored by selection. The iSAFE enables researchers to accurately pinpoint the favored mutation in a large region (∼5 Mbp) by using a statistic derived solely from population genetics signals. iSAFE does not require knowledge of demography, the phenotype under selection, or functional annotations of mutations.
Please read the FAQ for answers to the most common queries.
On 64-bit Linux and Mac OSX, you can install iSAFE from bioconda channel using conda package manager. iSAFE v1.1.0 and later is compatible with both python 2 and python 3.
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda install isafe
Manual installation: Software requirements ==========
Alternatively, and for all other OS and architectures, you can download the github repository and install isafe using the setup script.
Following python packages are required:
numpy
version 1.9 or above pandas
version 0.18 or aboveInstall bcftools
version 1.2 or above (only for --format vcf
, not required if you are using --format hap
).
$PATH
; otherwise, you have to change the following
line in ./isafe/bcftools.py
to the bcftools binary file path:
bcf_tools = "bcftools"
Clone the github repository by running (or you can download the repo)
git clone https://github.com/alek0991/iSAFE.git
Change to the iSAFE directory and run the pip installation
cd iSAFE
pip install .
Use the following command to see all the available options in iSAFE.
isafe --help
These detailed instructions are also provided in ./help.txt.
Note: If you have a script for iSAFE<v1.1.0 and want to update to the latest version you should change the commands from
python ./src/isafe.py [Options]
to
isafe [Options]
Consider a sample of phased haplotypes in a genomic region. We assume that all sites are biallelic and polymorphic in the sample. Thus, our input is in the form of a binary SNP matrix with each column corresponding to a haplotype and each row to a mutation, and entries corresponding to the allelic state, with 0 denoting the ancestral allele, and 1 denoting the derived allele. Not surprisingly, iSAFE performance deteriorated when the favored mutation is fixed or near fixation (favored allele frequency (ν) > 0.9 in Supplementary Fig. 3e). To handle this special case, we included individuals from non-target populations, using a specific protocol (See online Methods, section Adding outgroup samples). iSAFE can take input in hap or vcf formats.
--format hap
or -f hap
. With hap format, iSAFE assumes:
--format hap
, user is required to add outgroup samples to the input hap file if needed, based on this simple protocol mentioned above.--vcf-cont
, --sample-case
, --sample-cont
, --AA
.--format vcf
or -f vcf
--AA
) must be provided with --format vcf
. From version 1.0.5, if the ancestral allele file (--AA) is not available the program raises a warning and assumes reference allele (REF) is ancestral allele.--sample-case
. Otherwise all the samples in the input vcf file are considered as the case samples. See sample ID file format.--vcf-cont
is optional but recommended for capturing fixed sweeps. You can choose a subset of samples in this file by using --sample-cont
option, otherwise all the samples in this file are cosidered as control population. See sample ID file format. --sample-case
and --sample-cont
when the --input
and --vcf-cont
are the same (all samples are provided in a single vcf file).Note: The software in vcf mode is more flexible and has more options. But if you don't have out-group samples (--vcf-cont
is not set), and hap file and vcf file contain the exact same information (iSAFE only cares about position, haplotype phase, derived allele (1), and ancestral allele (0) in the vcf mode), then the output of iSAFE must be identical.
The output is a non-negative iSAFE-score for each mutation, according to its
likelihood of being the favored variant of the selective sweep.
Result (<output>.iSAFE.out
) is a TAB separated file in the following format.
POS | iSAFE | DAF |
---|---|---|
291 | 0.02 | 0.05 |
626 | 0.01 | 0.55 |
... | ... | ... |
With following headers:
--format vcf
and consequently --AA
:
--input
or --vcf-cont
:
--input
(case) or --vcf-cont
(control).With --format hap
, iSAFE assumes that derived allele is 1 and ancestral allele is 0 in the input file, and the selection is ongoing (the favored mutation is not fixed).
isafe --input ./example/hap/demo.hap --output ./example/hap/demo --format hap
Follow the instructions in the Data Requirements section and download Homo-Sapiens Ancestral Allele files and phased vcf files of Chromosome 2 of 1000GP populations (GRCh37/hg19), and replace the text in each < >
with the proper file path.
All the samples of the --input
vcf file as the case population:
isafe --input <chr2 vcf file> --output ./example/vcf/LCT --region 2:134108646-139108646 --AA <chr2 Ancestral Allele file>
A subset of samples (--sample-case
) of the --input
vcf file as the case population:
isafe --input <chr2 vcf file> --output ./example/vcf/LCT --region 2:134108646-139108646 --AA <chr2 Ancestral Allele file> --sample-case ./example/vcf/case.sample
A subset of samples (--sample-case
) of the --input
vcf file as the case population and a subset of samples (--sample-cont
) of the --vcf-cont
vcf file as the control population:
isafe --input <chr2 vcf file> --output ./example/vcf/LCT --region 2:134108646-139108646 --AA <chr2 Ancestral Allele file> --vcf-cont <chr2 vcf file> --sample-case ./example/vcf/case.sample --sample-cont ./example/vcf/cont.sample
--input
and --vcf-cont
can point to the same vcf file or different ones. In case they are the same, --sampe-case
and --sample-cont
are mandatory.