browning-lab / hap-ibd

The hap-ibd program detects identity-by-descent segments in phased genotype data.
42 stars 8 forks source link

What are recommended optional settings for WGS VCFs #3

Closed geneanalyst closed 2 years ago

geneanalyst commented 2 years ago

Using default optional settings with the Simons WGS dataset merged with other WGS samples returns no IBD segments using defaults.

The WGS dataset consists of variant only positions; 0/1 and 1/1. I added 0/0 sites to the VCFs manually. The dataset consisting of 50M positions was conformed using conform-gt using 1000G Phased references and 1000G Grch37 map

Ref and Map flags used in conform step as wells as phasing. Map flag used in hap-ibd step.

browning-lab commented 2 years ago

IBD segments will not be detected if the genotype error rate or phasing error rate is too high (measured in errors per Mb)

One strategy for dealing with genotype errors in sequence data is to adjust the hap-IBD parameter settings (see the hap-ibd paper for an example). Another strategy is to thin the markers using a stringent MAF threshold (see the ibd-ends paper for an example).

geneanalyst commented 2 years ago

For marker thinning do you have a preferred ascertained set such as: HapMap 4.2M markers or 1000G_omni2.5.b37 since thinning causes big issues with ascertainment bias.

When using WGS what has worked better for you? Increasing max-gap from default or some other parameter?