browning-lab / hap-ibd

The hap-ibd program detects identity-by-descent segments in phased genotype data.
42 stars 8 forks source link

hap-ibd

The hap-ibd program detects identity-by-descent (IBD) segments and homozygosity-by-descent (HBD) segments in phased genotype data. The hap-ibd program can analyze data sets with hundreds of thousands of samples.

If you use hap-ibd in a published analysis, please report the program version printed in the first line of the output log file and please cite the article that describes the hap-ibd method:

Y Zhou, S R Browning, B L Browning. A fast and simple method for detecting identity by descent segments in large-scale data. The American Journal of Human Genetics 106(4):426-437. doi: https://doi.org/10.1016/j.ajhg.2020.02.010

Brian Browning

Last updated: June 14, 2023

Installation

You can download the latest executable file, hap-ibd.jar, with the command:

wget https://faculty.washington.edu/browning/hap-ibd.jar

or you can download the source files and create the executable file with the commands:

git clone https://github.com/browning-lab/hap-ibd.git
javac -cp hap-ibd/src/ hap-ibd/src/hapibd/HapIbdMain.java
jar cfe hap-ibd.jar hapibd/HapIbdMain -C hap-ibd/src/ ./
jar -i hap-ibd.jar

Running hap-ibd

The hap-ibd program requires Java version 1.8 (or a later version). Use of an earlier Java version will produce an "Unsupported Class Version" error.

The command:

java -jar hap-ibd.jar

prints a summary of the command line arguments.

To run hap-ibd, enter the following command:

java -Xmx[GB]g -jar hap-ibd.jar [arguments]

where [GB] is the maximum number of gigabytes of memory to use, and [arguments] is a space-separated list of parameter values, each expressed as parameter=value.

The shell script run.hap-ibd.test will run a test hap-ibd analysis.

Required Parameters

The hap-ibd program has three required parameters.

Optional Parameters

Output files

The hap-ibd program produces three output files: a log file, an ibd file, and an hbd file.

The log file (.log) contains a summary of the analysis, which includes the analysis parameters, the number of markers, the number of samples, the number of output HBD and IBD segments, and the mean number of HBD and IBD segments per sample.

The gzip-compressed ibd file (.ibd.gz) contains IBD segments shared between individuals. The gzip-compressed hbd file (.hbd.gz) contains HBD segments within within individuals. Each line of the ibd and hbd output files represents one IBD or HBD segment and contains 8 tab-delimited fields:

  1. First sample identifier
  2. First sample haplotype index (1 or 2)
  3. Second sample identifier
  4. Second sample haplotype index (1 or 2)
  5. Chromosome
  6. Base coordinate of first marker in segment
  7. Base coordinate of last marker in segment
  8. cM length of IBD segment

License

The hap-ibd program is licensed under the Apache License, Version 2.0 (the License). You may obtain a copy of the License from http://www.apache.org/licenses/LICENSE-2.0