ANGSD-wrapper / angsd-wrapper

Utilities for analyzing next generation sequencing data.
MIT License
14 stars 4 forks source link
admixture bam-files bash fasta genotype-likelihoods gui inbreeding-coefficient next-generation-sequencing ngs ngs-analysis population-genetics r sfs shell software thetas user-friendly visualization wrapper

ANGSD-wrapper

Active Development

As of October 2020, ANGSD-wrapper will be undergoing active development by Samuel Hamann to improve the project. Some areas of improvement include:

ANGSD-wrapper is a utility developed to aid in the analysis of next generation sequencing data. Users can do the following with this suite:

Likelihood based approaches are used in ANGSD to calculate summary statistics from next generation sequencing data. The wrapper scripts and documentation are designed to make ANGSD user-friendly.

Installing ANGSD-wrapper

To install ANGSD-wrapper, download from GitHub

git clone https://github.com/ANGSD-wrapper/angsd-wrapper.git

Go into the ANGSD-wrapper directory

cd angsd-wrapper/

Run the setup command

./angsd-wrapper setup dependencies

Download the example dataset (optional)

./angsd-wrapper setup data

Finish the installation

source ~/.bash_profile

A note about BAM files

ANGSD requires BAM files as input, and ANGSD-wrapper passes a list of BAM files to ANGSD. These BAM files have a few requirements:

To see whether or not the BAM files have an '@HD' header line, run the following on your list of samples:

for sample in `cat ~/path/to/sample_list.txt`
do
    echo $sample
    samtools view -H $sample | head -1
done

If any samples start with '@SQ' instead of '@HD', ANGSD and ANGSD-wrapper will fail. This Gist will add an @HD header lines to your BAM files.

The index files must be generated after the BAM files. To index the BAM files using SAMTools, run the following on your sample list:

for sample in `cat ~/path/to/sample_list.txt`
do
    samtools index $sample
done

If you have GNU Parallel installed on your system, this process can be sped up:

cat ~/path/to/sample_list.txt | parallel samtools index {}

Basic usage

To run ANGSD-wrapper, run

angsd-wrapper <wrapper> <config>

Where wrapper is one of the methods that ANGSD-wrapper can run and config is the relative path to the corresponding configuration file.

To see a list of available wrappers, run

angsd-wrapper

Configuration files

There is a configuration (config) file for each method available through angsd-wrapper. The configuration files hold variables used by the wrappers. This is where you need to modify and save the variables (i.e., specify filepaths of indexed BAM files/CRAM files, FASTA files, sample lists, etc.) to suit your samples before running angsd-wrapper with a specified method.

The default config files can be found in the Configuration_Files directory. You will need to modify them to suit your samples. Please refer to the config files or the wiki to see what each variable is used for and how they should be specified. If you run angsd-wrapper without any arguments, it will return a usage message.

Example config files can be found in Example_Data/Configuration_Files upon running angsd-wrapper setup data.

Futher Information

For more information about ANGSD-wrapper, the methods availble through ANGSD-wrapper, and a comprehensive tutorial, please see the wiki.

Dependencies

This package requires the following dependencies:

These are downloaded and installed automatically when angsd-wrapper is installed

There are a few other dependencies that are not automatically downloaded during the installation:

Supported methods

Citing ANGSD-wrapper

ANGSD-wrapper was published in Molecular Ecology Resources; if you use this in your work please cite the paper. For BibTeX users, the citation is as follows:

@article {MEN:MEN12578,
author = {Durvasula, Arun and Hoffman, Paul J. and Kent, Tyler V. and Liu, Chaochih and Kono, Thomas J. Y. and Morrell, Peter L. and Ross-Ibarra, Jeffrey},
title = {angsd-wrapper: utilities for analysing next-generation sequencing data},
journal = {Molecular Ecology Resources},
issn = {1755-0998},
url = {http://dx.doi.org/10.1111/1755-0998.12578},
doi = {10.1111/1755-0998.12578},
pages = {n/a--n/a},
keywords = {domestication, population genetics, software, visualization, Zea},
year = {2016},
}