brentp / duphold

don't get DUP'ed or DEL'ed by your putative SVs.
MIT License
101 stars 9 forks source link

what size flanking region is used to calculate DHFFC? #17

Closed JoWhi closed 5 years ago

JoWhi commented 5 years ago

I'm looking at a 214 bp deletion that shows a clear drop in coverage on IGV for several individuals, yet DHFFC for this variant has been scored at >1 in all my heterozygotes, sometimes up to 15. This deletion is in the middle of a 3 kb region surrounded by gaps in the alignments. If the flanking region used to calculate DHFFC is only about 300bp, then I would expect DHFFC to be <1, so does this mean coverage within the variant is being compared to much larger flanking regions? And is this parameter modifiable? Thank you!

brentp commented 5 years ago

hi, I am working on some changes now that will allow this to be set with an environment variable. The default is 5000 bases on either side (for a total of 10K bases). It then uses the median. So, as long as a total of > 5000 bases are of a representative diploid depth, it should give you a reasonable estimate. Can you share a screenshot of the region from one sample with the deletion?

Also, can you make sure you are using v0.1.1 of duphold as there were changes in that release that might affect your scenario here.

JoWhi commented 5 years ago

214bp del 12kb context Here are screenshots centred on the deletion, showing coverage and the 12kb region with loads of gaps. If it's using a 10kb region for the calculations, that would make sense. Looks like we have v0.0.9 running, so time for an upgrade!

brentp commented 5 years ago

perfect. I think updating to v0.1.1 will fix this problem. if not, would you be willing to share just that chromosome so I can figure out how to handle this better?

JoWhi commented 5 years ago

Does the new version use a smaller flanking region? I will try it, and let you know. Yes, happy to share!

JoWhi commented 5 years ago

The individual in the screenshots still gets DHFFC annotated as 15.33 at this variant, using v0.1.1.

brentp commented 5 years ago

ok. would you share the data for just that (full) chromosome along with the coordinates of the event?

JoWhi commented 5 years ago

Yes. How do I do that?

JoWhi commented 5 years ago

NW_017858824.1 135118 72454 N <DEL> 5875.46 . SVTYPE=DEL;SVLEN=-214;END=135332;STRANDS=+-:300;CIPOS=0,0;CIEND=0,0;CIPOS95=0,0;CIEND95=0,0;SU=300;PE=2;SR=298;AC=30;AN=92;GCF=0.306977 GT:GQ:SQ:GL:DP:RO:AO:QR:QA:RS:AS:ASC:RP:AP:AB:DHFC:DHFFC:DHBFC:DHSP:DHET:DHHU 0/1:200:431.2:-52,-9,-65:118:91:27:90:26:60:5:6:30:14:0.22:1.91667:15.3333:1.84:14:0,0:1,1,0

JoWhi commented 5 years ago

Is it the raw reads for that contig extracted from the BAM files that you need?

brentp commented 5 years ago

Is it the raw reads for that contig extracted from the BAM files that you need?

yes, please. as you likely know, you can get those with samtools view -o subset.bam $bam NW_017858824.1

brentp commented 5 years ago

I have made a new release that should address this issue: https://github.com/brentp/duphold/releases/tag/v0.1.2

please let me know any other problems and thanks for reporting and providing a test-case.