Closed jsquaredosquared closed 5 months ago
@edg1983 do you have any idea what could be going on? Any help would be appreciated 🥲
Hello!
I'm sorry for taking so long to fix this. I've investigated using your test case and found a bug causing SV intervals not to be generated properly. Essentially, the intervals were evaluated as just POS-POS+1 instead of POS-SVEND. It is now fixed in v1.3 (commit 0d2ebfc).
I've tested with some variants, including your example, and I can now get all the expected overlapping events. Please update to the new release (v1.3) and confirm it's running correctly.
This is the comparison for your specific event INV chr14:73085269-73141832
#Expected Observed
145767_pro 145767_pro
145768_pro 145768_pro
145769_pro 145769_pro
145770_pro 145770_pro
145771_pro 145771_pro
145772_pro 145772_pro
145773_pro 145773_pro
145774_pro 145774_pro
145775_pro 145775_pro
145776_pro 145776_pro
145777_pro 145777_pro
145778_pro 145778_pro
145779_pro 145779_pro
145780_pro 145780_pro
145781_pro 145781_pro
145782_pro 145782_pro
145783_pro 145783_pro
1553503_enh 1553503_enh
1553504_enh 1553504_enh
1553505_enh 1553505_enh
1553506_enh 1553506_enh
1553507_enh 1553507_enh
1553508_enh 1553508_enh
1553509_enh 1553509_enh
1800077_enh 1800077_enh
417710_enh 417710_enh
417711_enh 417711_enh
417712_enh 417712_enh
417713_enh 417713_enh
417714_enh 417714_enh
417715_enh 417715_enh
417716_enh 417716_enh
417717_enh 417717_enh
417718_enh 417718_enh
417719_enh 417719_enh
417720_enh 417720_enh
417721_enh 417721_enh
417722_enh 417722_enh
417723_enh 417723_enh
417724_enh 417724_enh
417725_enh 417725_enh
417726_enh 417726_enh
417727_enh 417727_enh
417728_enh 417728_enh
417729_enh 417729_enh
417730_enh 417730_enh
417731_enh 417731_enh
417732_enh 417732_enh
417733_enh 417733_enh
417734_enh 417734_enh
417735_enh 417735_enh
417736_enh 417736_enh
417737_enh 417737_enh
417738_enh 417738_enh
417739_enh 417739_enh
417740_enh 417740_enh
417741_enh 417741_enh
417742_enh 417742_enh
417743_enh 417743_enh
566516_pro 566516_pro
566518_pro 566518_pro
580535_pro 580535_pro
Thank you so much for looking into it!
Just one thing: you mentioned the interval is now evaluated from POS and [SV]END. But from my experience, sometimes the END is not guaranteed to be present. SV callers such as Manta and TARDIS do not always report it. The extent of the SV may need to be inferred from POS and SVLEN, which may be present when END is not present. (I think this is how vcfanno does it.)
Perhaps this is something that could be taken into account, so that even if END is not present the interval could still be calculated?
P.S., I couldn't quickly find an example that overlaps a regulatory region, but here are some examples where END is not reported (usually when the SV is PRECISE, and the exact sequence is given in ALT instead of just a symbolic allele):
Example from Manta:
chr1 4129813 MantaDEL:332:0:0:0:0:0 TCCTCTGCCCCACGTGGCGTTGGGGTCCCAGCAGGAGATGTGGGCCGGGAGCCTGTTGATC T 267 PASS SVTYPE=DEL;SVLEN=-60;CIGAR=1M60D;CIPOS=0,45;HOMLEN=45;HOMSEQ=CCTCTGCCCCACGTGGCGTTGGGGTCCCAGCAGGAGATGTGGGCC GT:FT:GQ:PL:PR:SR
Example from TARDIS:
chr1 14841386 vh_del_21 acagagcaagactccgtctcaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaGAAAT a 255 PASS SVLEN=-61;RPSUP=6;SRSUP=12;CIEND=0,0;CIPOS=0,0;PRECISE;SVALG=TARDIS_v1.0.9;SVTYPE=DEL GT:CNVL:RP:SR
Hi,
The logic built into the tool to create SV intervals is more complex and works as follows. I've now made it more robust, incorporating your suggestions, and created a new release (1.3.1). Please use this one.
For DEL, DUP, INV:
Since INS or BND are mostly represented by single positions in VCF, i.e. SVLEN=1 and END=POS+1, you can configure a manual padding value using --padding
. This padding value is added/subtracted after all previous steps when SVTYPE equals INS or BND.
Some examples
CHROM | POS | SVTYPE | SVLEN | END | CIPOS | PADDING | INTERVAL |
---|---|---|---|---|---|---|---|
chr1 | 1000 | DEL | -100 | 1100 | -5,10 | 5 | chr1:995-1110 |
chr1 | 1000 | DEL | -200 | NA | -5,10 | 5 | chr1:995-1210 |
chr1 | 1000 | DEL | -100 | 1100 | NA | 5 | chr1:1000-1100 |
chr1 | 1000 | DEL | -200 | NA | NA | 5 | chr1:1000-1200 |
chr1 | 1000 | INS | 1 | 1001 | -5,10 | 5 | chr1:990-1016 |
chr1 | 1000 | INS | 1 | 1001 | NA | 20 | chr1:980-1021 |
chr1 | 1000 | INS | 1 | 1001 | NA | NA | chr1:1000-1001 |
Awesome! Thanks again :smile: :+1:
Solved in version 1.3.1
TL;DR
Why is the SV chr14_73,085,269_73,141,832_INV not annotated with the promoter 580535_pro (chr14_73,135,539_73,138,139) despite the promoter being fully within the SV region?
Thanks for this helpful tool!
Problem
I am having trouble understanding why a particular SV is not annotated with a particular region despite the SV region containing the regulatory region.
The SV I am trying to annotate is an inversion (chr14_73,085,269_73,141,832_INV).
I checked GREEN-DB, and the promoter 580535_pro (chr14_73,135,539_73,138,139) is fully contained within the inverted region.
However, the SV is not annotated with this promoter. There are some other regulatory regions contained in the inverted region (e.g., 145783_pro at chr14:73,137,653-73,137,980), but these are also not added to the annotations.
The SV is only annotated with 417710_enh (chr14:73,084,745-73,092,455).
Command