bioXiaoheng / BalLeRMix

Software package for BalLeRMix and scripts used in the study "Flexible mixture model approaches that accommodate footprint size variability for robust detection of balancing selection" (Cheng & DeGiorgio 2020)
5 stars 1 forks source link

BalLeRMix---Balancing selection Likelihood Ratio Mixture models

This repository hosts the software package for BalLeRMix and scripts used in the study "Flexible mixture model approaches that accommodate footprint size variability for robust detection of balancing selection" (Cheng & DeGiorgio 2020).

Please cite the following manuscript if using this software:

Xiaoheng Cheng, Michael DeGiorgio (2020) Flexible mixture model approaches that accommodate footprint size variability for robust detection of balancing selection. Molecular Biology and Evolution, 37(11): 3267--3291


In BalLeRMix v2, we introduce the -m <m> argument to customize the presumed number of alleles being balanced at the selected sites, in case you want to look for multi-allelic balancing selection. The default value is 2.

2020.6.22-Update: Updated the model for multi-allelic balancing selection in v2.2.

2020.2.5-Update: Fixed a minor bug in the initialization module.


Quick Guide

usage: BalLeRMix.py [-h] -i INFILE --spect SPECTFILE [-o OUTFILE] [-m M]
                      [--getSpect] [--getConfig] [--nofreq] [--nosub] [--MAF]
                      [--physPos] [--rec RRATE] [--fixSize] [-w R]
                      [--noCenter] [-s STEP] [--fixX X] [--rangeA SEQA]
                      [--listA LISTA]

You can use python BalLeRMix.py -h to see the more detailed help page.

1. Input format

For B0 and B2 statistics, the user should first generate the tab-delimited site frequency spectrum file, without header, e.g.:

\<k> \<sample size n> \<proportion in the genome>
1 50 0.03572
2 50 0.02024

...

or the configuration file with polymorphism/substitution ratio, without header, e.g.:

\<sample size n> \<\% of substitutions> \<\% of polymorphisms>
50 0.7346 0.2654

The input files should have four columns, presenting physical positions, genetic positions, number of derived (or minor) alleles observed, and total number of alleles observed (i.e. sample size). This file should be tab-delimited and should have a header, e.g.:

physPos genPos x n
16 0.000016 50 50
35 0.000035 12 50

...

2. Running the B statistics

To perform B2 scans on your input data, use

python BalLeRMix.py -i <input> --spect <derived allele frequency spectrum> -o <output>

To perform B2,MAF scans on your input data, use

python BalLeRMix.py -i <input> --spect <minor allele frequency spectrum> -o <output> --MAF

To perform B1 scans on your input data, use

python BalLeRMix.py -i <input> --config <sub/poly configuration file> -o <output> --nofreq

To perform B0 scans on your input data, use

python BalLeRMix.py -i <input> --config <derived allele frequency spectrum> -o <output> --nosub 

To perform B0,MAF scans on your input data, use

python BalLeRMix.py -i <input> --config <minor allele frequency spectrum> -o <output> --nosub --MAF

3. Generate helper files

To generate spectrum file for B2:

python BalLeRMix.py -i <concatenated input> --getSpect --spect <spectrum file name>

To generate spectrum file for B2,MAF:

python BalLeRMix.py -i <concatenated input> --getSpect --MAF --spect <spectrum file name>

To generate spectrum file for B1:

python BalLeRMix.py -i <concatenated input> --getConfig --spect <config file name>

To generate spectrum file for B0:

python BalLeRMix.py -i <concatenated input> --getSpect --nosbub --spect <spectrum file name>

To generate spectrum file for B0,MAF:

python BalLeRMix.py -i <concatenated input> --getSpect --nosub --MAF --spect <spectrum file name>

4. Customizing the scan

All arguments besides the aforementioned ones are for customizing the scan.