Illumina / ExpansionHunter

A tool for estimating repeat sizes
Other
174 stars 53 forks source link

Region extension length #109

Closed depreeuwj closed 3 years ago

depreeuwj commented 3 years ago

Hi, Your tool is very usefull, thank you very much! I'm trying to use it for Exome Seq data and it seems to work. However, I see some differences in the number of samples with a "pass" if I change the region extension length. What exactly is the meening of this parameter? By reading the documentation, my best guess is that it will only look for reads -X and +X of this region. If the parameter is set to 1000, does this means looking at reads at -1000 and +1000 of the ROI, or rather -500 and +500 of the ROI? Thanks in advance, Jeroen

egor-dolzhenko commented 3 years ago

Hi Jeroen,

Apologies for the late reply.

That's right -- region extension length defines the region from which the reads are collected. If this parameter is set to 1000 then EH will collect reads aligned up to 1Kb away form the repeat.

This behavior makes sense: If the read coverage suddenly drops to zero near the repeat region then longer "region extension length"s may result in very low overall coverage estimates (and hence non-PASSing calls) while smaller "region extension length"s will produce PASSing calls. However, for best results, this parameter should be set to a value that is larger than the mean fragment length. Otherwise, the size of the repeat may be underestimated.

If you'd like, we can take a closer look at your data. Please feel free to send me an email anytime: edolzhenko@illumina.com

Best wishes, Egor

depreeuwj commented 3 years ago

Great, thank you very much for the quick reply!