LohseLab / gIMble

A genome-wide IM blockwise likelihood estimation toolkit
GNU General Public License v3.0
15 stars 4 forks source link

gimble

DOI

Table of contents

Installation

conda install -c conda-forge gimble

Workflow

gIMble workflow gIMble workflow. gimbleprep (0) assures input data conforms to requirements; parse (1) reads data into a gIMble store, the central data structure that holds all subsequent analysis. The modules blocks (2) and windows (3) partition the data which is summarised as a tally (4) of blockwise mutation configurations (bSFSs) either across all pair-blocks (blocks tally) or for pair-blocks in windows (windows tally). Tallies may be used either in a bounded search of parameter space via the module optimize (5) or to evaluate likelihoods over a parameter grid (which is precomputed using makegrid (6)) via the gridsearch (7) module. The simulate (8) module allows coalescent simulation of tallies (simulate tally) based on inferred parameter estimates (either global estimates or gridsearch results of window-wise data). Simulated data can be analysed to quantify the uncertainty (and potential bias) of parameter estimates. The results held within a gIMble store can be described, written to column-based output files or removed using the modules info (9), query (10), and delete (11).

Usage

usage: gimble <module> [<args>...] [-V -h]

  [Input]
    preprocess            Install gimbleprep instead
    parse                 Parse files into GimbleStore
    blocks                Generate blocks from parsed data in GimbleStore (requires 'parse')
    windows               Generate windows from blocks in GimbleStore (requires 'blocks')
    tally                 Tally variation for inference (requires 'blocks' or 'windows')

  [Simulation]
    simulate              Simulate data based on specific parameters or gridsearch results  

  [Inference]
    optimize              Perform global parameter optimisation on tally/simulation
    makegrid              Precalculate grid of parameters
    gridsearch            Evaluate tally/simulation against a precomputed grid (requires 'makegrid')

  [Info]
    info                  Print metrics about data in GimbleStore
    list                  List information saved in GimbleStore
    query                 Extract information from GimbleStore
    delete                Delete information in GimbleStore

  [Experimental]
    partitioncds          Partition CDS sites in BED file by degeneracy in sample GTs

  [Options]
    -h, --help            Show this screen
    -V, --version         Show version

Gimble modules

preprocess

Note: The preprocess module has been replaced by gimbleprep. Everything else is identical. The preprocess module assures that input files are adequately filtered and processed so that the gimble workflow can be completed successfully. While this processing of input files could be done more efficiently with other means, it has the advantage of generating a VCF file complies with gimble data requirements but which can also be used in alternative downstream analyses.

conda install -c bioconda gimbleprep
gimbleprep -f FASTA -b BAM_DIR/ -v RAW.vcf.gz -k

Based on the supplied input files:

the module produces the following output files:

After running, output files require manual user input (see Manually modify files)

VCF processing details

BAM processing details

Manually modify preprocessed files

parse

blocks

windows

info

tally

optimize

makegrid

gridsearch

simulate

based on gridsearch result

gimble simulate -z analysis.z \ --seed 19 --replicates 100 --windows 11217 --blocks 500 \ --block_length 64 -a 10 -b 10 \ --gridsearch_key gridsearch/windows_kmax2/IM_BA_grid \ -k 2,2,2,2 -s IM_BA_grid -p 55 -u 2.9e-9