algo-cancer / ImmunoTyper-SR

Genotyping Immunoglobulin Heavy Chain Variable Genes using Short Read Data
Other
8 stars 1 forks source link

πŸŽ‰ ImmunoTyper-SR 🧬

ImmunoTyper-SR is a powerful tool for Immunoglobulin Variable Gene genotyping and CNV analysis from whole genome sequencing (WGS) short reads using ILP Optimization. Check out our paper here for more details.

πŸ“’ New Feature: Now supporting IGLV and TRAV genotyping! πŸŽ‰

πŸš€ Installation

Gurobi

ImmunoTyper-SR leverages the Gurobi solver for optimization. You need a valid license to use Gurobi. Licenses are free for academic purposes.

Docker / Singularity

For the easiest installation, we recommend using the Docker image available on DockerHub at cdslsahinalp/immunotyper-sr.

To run the image with Singularity (commonly used on HPCs), use the following command:

singularity pull docker://cdslsahinalp/immunotyper-sr
singularity run -B <GUROBI_LICENSE_PATH>:/opt/gurobi/gurobi.lic -B <BAM_DIRECTORY>:<BAM_DIRECTORY> -B <OUTPUT_PATH>:/output immunotyper-sr_latest.sif <BAM_DIRECTORY>/<BAM_FILE>

Conda + Pip

If you already have BWA installed and prefer not to create a new environment, you can download the latest release binary (see right toolbar) and install it with pip:

pip install <binary.whl>

For the best experience, we recommend setting up a clean environment first:

conda create -n immunotyper-SR -c bioconda python=3.8 bwa samtools
conda activate immunotyper-SR
pip install <binary.whl>

Environment and Dependencies

Installing ImmunoTyper-SR with pip will automatically install these dependencies:

In addition to the above, you will need

  1. BWA mem mapper. We recommend using a new conda environment for the installation, which you can also use to install BWA:
conda create -n immunotyper-SR -c bioconda python=3.8 bwa samtools
conda activate immunotyper-SR
pip install <binary.whl>
  1. Gurobi solver configured with a valid license

To check that gurobi is correctly configured, run gurobi_cl from a shell.

Installing from source

If the binary fails to install, you can build the tool from source:

conda create -n immunotyper-SR -c bioconda python=3.8 bwa samtools
conda activate immunotyper-SR
git clone git@github.com:algo-cancer/ImmunoTyper-SR.git ./ImmunoTyper-SR
cd ImmunoTyper-SR
python -m pip install --upgrade  build
python -m build
pip install dist/<.tar.gz or .whl build>

πŸ› οΈ Running ImmunoTyper-SR:

After installing with pip, use the command immunotyper-SR. The only required input is a BAM file. Outputs are generated in the current working directory, where is the input BAM filename without the extension:

IMPORTANT: If your BAM was mapped to GRCh37 use the --hg37 flag.

$ immunotyper-SR --help
usage: immunotyper-SR [-h] [--gene_type {ighv,iglv,trav,igkv}] [--output_dir OUTPUT_DIR] [--ref REF] [--hg37] [--bwa BWA] [--max_copy MAX_COPY] [--landmarks_per_group LANDMARKS_PER_GROUP] [--landmark_groups LANDMARK_GROUPS] [--stdev_coeff STDEV_COEFF] [--seq_error_rate SEQ_ERROR_RATE] [--solver_time_limit SOLVER_TIME_LIMIT] [--debug_log_path DEBUG_LOG_PATH]
                      [--write_cache_path WRITE_CACHE_PATH] [--threads THREADS] [--no_coverage_estimation]
                      bam_path

ImmunoTyper-SR: Ig Genotyping using Short Read WGS

positional arguments:
  bam_path              Input BAM file

optional arguments:
  -h, --help            show this help message and exit
  --gene_type {ighv,iglv,trav,igkv}
                        Specify which genes to target
  --output_dir OUTPUT_DIR
                        Path to output directory. Outputs txt file of allele calls with prefix matching input BAM file name.
  --ref REF             Path to the reference FASTA to decode CRAM files. Option is not used if bam_path is not a CRAM.
  --hg37                Flag if BAM mapped to GRCh37 not GRCh38
  --bwa BWA             path to bwa executible if not in $PATH
  --max_copy MAX_COPY   Maximum number of allele copies to call
  --landmarks_per_group LANDMARKS_PER_GROUP
                        Number of landmarks per group to use (default = 6)
  --landmark_groups LANDMARK_GROUPS
                        Number of landmark groups to use (default = 6)
  --stdev_coeff STDEV_COEFF
                        Standard deviation scaling coefficient (default = 1.5)
  --seq_error_rate SEQ_ERROR_RATE
                        Expected sequence error rate (default = 0.02)
  --solver_time_limit SOLVER_TIME_LIMIT
                        Time limit for ILP solver in hours
  --debug_log_path DEBUG_LOG_PATH
                        Path to write log
  --write_cache_path WRITE_CACHE_PATH
                        Specific location and name of allele db sam mapping cache
  --threads THREADS     Max number of threads to use
  --no_coverage_estimation
                        Disables empirical coverage