Rosemeis / fastmixture

Software for ancestry estimation in unrelated individuals
GNU General Public License v3.0
11 stars 0 forks source link

fastmixture (v0.93.4)

fastmixture is a new software for estimating ancestry proportions in unrelated individuals. It is significantly faster than previous model-based software while providing accurate and robust ancestry estimates.

Table of Contents

Installation

To run the fastmixture software, you have a few options depending on your environment and preference:

  1. Installing fastmixture via PyPI or Source Code

    # Option 1: Build and install via PyPI
    pip install fastmixture
    
    # Option 2: Download source and install via pip
    git clone https://github.com/Rosemeis/fastmixture.git
    cd fastmixture
    pip install .
    
    # Option 3: Download source and install in a new Conda environment
    git clone https://github.com/Rosemeis/fastmixture.git
    conda env create -f fastmixture/environment.yml
    conda activate fastmixture

    You can now run the program with the fastmixture command. For more details on running it, see the Usage section.

  2. Using the fastmixture docker image with Docker or Apptainer

    If you prefer or need to use a containerized setup (especially useful in HPC environments), a pre-built fastmixture container image is available on Docker Hub. The latest version corresponds to v0.93.4.

    A. Using Docker

    1. Pull the image from Docker Hub
    # Docker command
    docker pull albarema/fastmixture
    1. Run the fastmixture container
    # Mount the directory containing the PLINK files using --volume flag (e.g. `pwd`/project-data/) 
    # Indicate the cpus available for the container to run
    # e.g. data prefix is 'toy.data' and results prefix is 'toy.fast'
    docker run --cpus=8 -v `pwd`/project-data/:/data/ albarema/fastmixture fastmixture --bfile data/toy.data --K 3 --out data/toy.fast --threads 8

    B. Using Apptainer (formerly Singularity)

    For Apptainer/Singularity users, please take a look at your HPC system's documentation for guidance. Apptainer will create the .sif image in your current working directory (pwd) by default. You will later use this image to run the software. If needed, specify a different directory and filename to store the image. Remember to bind the directories where the data is stored (--bind).

    1. Pull fastmixture container image into a .sif file that Apptainer can use
    # Singularity/Apptainer
    apptainer pull <fastmixture.sif> docker://albarema/fastmixture
    1. Run fastmixture container
    # Singularity/Apptainer
    apptainer run <fastmixture.sif> fastmixture --bfile data/toy.data --K 3 --out data/toy.fast --threads 8

Citation

Please cite our preprint on BioRxiv.

Usage

fastmixture requires input data in binary PLINK format.

# Using binary PLINK files for K=3
fastmixture --bfile data --K 3 --threads 32 --seed 1 --out test

# Outputs Q and P files (test.K3.s1.Q and test.K3.s1.P)

Supervised

A supervised mode is available in fastmixture using --supervised. Provide a file of population assignments for individuals as integers in a single column file. Unknown or admixed individuals must be given a value of '0'.

# Using binary PLINK files for K=3
fastmixture --bfile data --K 3 --threads 32 --seed 1 --out super.test --supervised data.labels

# Outputs Q and P files (super.K3.s1.Q and super.K3.s1.P)

Extra options

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details

Authors and Acknowledgements