BCM-Lupskilab / HMZDelFinder

CNV calling algorithm for detection of homozygous and hemizygous deletions from whole exome sequencing data
GNU General Public License v2.0
11 stars 12 forks source link

HMZDelFinder

CNV calling algorithm for detection of rare, homozygous and hemizygous deletions from whole exome sequencing data

Prerequisites

Following R libraries are required to run HMZDelFinder:

To install missing packages, run the code from the appropriate sections ('install missing packages from ...') at example/example_run.R

Running HMZDelFinder

Format of input files

BED file

Tab delimited file without header and four columns:

RPKM files

Tab delimited file with a header and two columns:

IMPORTANT: The number of rows and the order of capture targets have to correspond to the number of rows and the order defined in the BED file.

To generate RPKM files from BAM files, see comments at example/example_run.R.

VCF files

VCF files are required for AOH analysis and further filtering of identfied deletion calls. We assume that all files are single sample VCFs compressed with bz2. In general, VCF should follow the standard VCF format, however, the following columns are the most important:

NOTE: Please note that to calculate B-allele frequency (needed for AOH analysis) it is required that in the last column of VCF, both total number of reads and the number of variant reads are reported for every variant. Moreover, all multiallelic sites should be filtered out. Such VCFs can be generated, e.g. by Atlas2 variant caller.

Format of output files

Object returned by runHMZDelFinder(...), contains the following items:

Format of filteredCalls/allCalls

Both objects are data.frames with the following columns: