Closed JoWhi closed 5 years ago
hi, I am working on some changes now that will allow this to be set with an environment variable. The default is 5000 bases on either side (for a total of 10K bases). It then uses the median. So, as long as a total of > 5000 bases are of a representative diploid depth, it should give you a reasonable estimate. Can you share a screenshot of the region from one sample with the deletion?
Also, can you make sure you are using v0.1.1 of duphold as there were changes in that release that might affect your scenario here.
Here are screenshots centred on the deletion, showing coverage and the 12kb region with loads of gaps. If it's using a 10kb region for the calculations, that would make sense. Looks like we have v0.0.9 running, so time for an upgrade!
perfect. I think updating to v0.1.1 will fix this problem. if not, would you be willing to share just that chromosome so I can figure out how to handle this better?
Does the new version use a smaller flanking region? I will try it, and let you know. Yes, happy to share!
The individual in the screenshots still gets DHFFC annotated as 15.33 at this variant, using v0.1.1.
ok. would you share the data for just that (full) chromosome along with the coordinates of the event?
Yes. How do I do that?
NW_017858824.1 135118 72454 N <DEL> 5875.46 . SVTYPE=DEL;SVLEN=-214;END=135332;STRANDS=+-:300;CIPOS=0,0;CIEND=0,0;CIPOS95=0,0;CIEND95=0,0;SU=300;PE=2;SR=298;AC=30;AN=92;GCF=0.306977 GT:GQ:SQ:GL:DP:RO:AO:QR:QA:RS:AS:ASC:RP:AP:AB:DHFC:DHFFC:DHBFC:DHSP:DHET:DHHU 0/1:200:431.2:-52,-9,-65:118:91:27:90:26:60:5:6:30:14:0.22:1.91667:15.3333:1.84:14:0,0:1,1,0
Is it the raw reads for that contig extracted from the BAM files that you need?
Is it the raw reads for that contig extracted from the BAM files that you need?
yes, please.
as you likely know, you can get those with samtools view -o subset.bam $bam NW_017858824.1
I have made a new release that should address this issue: https://github.com/brentp/duphold/releases/tag/v0.1.2
please let me know any other problems and thanks for reporting and providing a test-case.
I'm looking at a 214 bp deletion that shows a clear drop in coverage on IGV for several individuals, yet DHFFC for this variant has been scored at >1 in all my heterozygotes, sometimes up to 15. This deletion is in the middle of a 3 kb region surrounded by gaps in the alignments. If the flanking region used to calculate DHFFC is only about 300bp, then I would expect DHFFC to be <1, so does this mean coverage within the variant is being compared to much larger flanking regions? And is this parameter modifiable? Thank you!