Open northwestwitch opened 2 years ago
Example provided by the cancer team:
14 106531322 SV_4005_1 N ]14:107010501]N . PASS SVTYPE=BND;REGIONA=106531322,106531405;REGIONB=107010346,107010501;LFA=17,0;LFB=17,0;LTE=6,0;CTG=. GT:cn:COV:DR:SR:LQ:RR:RD 0/1:2:30,27.162624934793953,41:6:0:0.0,0.0:0,26:5,33
14 106531322 SV_4006_1 N ]14:106712012]N . PASS SVTYPE=BND;REGIONA=106531322,106531378;REGIONB=106712012,106712150;LFA=15,0;LFB=15,0;LTE=8,0;CTG=CCACATAATCTAAGTGGGACCTCAGCATTGAGCATTCATGGACATAAATGTGCGAATGATAGACACTGTGGACTGCTGGAGAGTGGAGGGAGGGGGTGATGGAATCTGGATTCCAAACCTCAGCATCACTCAATAATCCCATGTGACAAGTCCACACATATGCCCTCTGTATCTGAATGAAAACTTGAAATTAAATAAAAATCCTTATGTGAGAGCTGACTGGAAGCACCAAAGAGGACACTTGTTGTGGAGATTGACCTGCTCCTCATCCTAACTTAGGTGCTGGAGACAAATGTGTGCACATATGTC GT:cn:COV:DR:SR:LQ:RR:RD 0/1:2:27,24.460027662517287,46:8:0:0.0,0.0:0,28:5,50
14 106712012 SV_4006_2 N N[14:106531322[ . PASS SVTYPE=BND;REGIONA=106531322,106531378;REGIONB=106712012,106712150;LFA=15,0;LFB=15,0;LTE=8,0;CTG=CCACATAATCTAAGTGGGACCTCAGCATTGAGCATTCATGGACATAAATGTGCGAATGATAGACACTGTGGACTGCTGGAGAGTGGAGGGAGGGGGTGATGGAATCTGGATTCCAAACCTCAGCATCACTCAATAATCCCATGTGACAAGTCCACACATATGCCCTCTGTATCTGAATGAAAACTTGAAATTAAATAAAAATCCTTATGTGAGAGCTGACTGGAAGCACCAAAGAGGACACTTGTTGTGGAGATTGACCTGCTCCTCATCCTAACTTAGGTGCTGGAGACAAATGTGTGCACATATGTC GT:cn:COV:DR:SR:LQ:RR:RD 0/1:2:27,24.460027662517287,46:8:0:0.0,0.0:0,28:5,50
14 107010501 SV_4005_2 N N[14:106531322[ . PASS SVTYPE=BND;REGIONA=106531322,106531405;REGIONB=107010346,107010501;LFA=17,0;LFB=17,0;LTE=6,0;CTG=. GT:cn:COV:DR:SR:LQ:RR:RD 0/1:2:30,27.162624934793953,41:6:0:0.0,0.0:0,26:5,33
Examples: https://scout-stage.scilifelab.se/cust059/G1A1471p10_Balsamic and https://scout-stage.scilifelab.se/cust083/KMP-00064T-20191996305 . Check also my comment when testing loading these vars: https://github.com/Clinical-Genomics/scout/pull/3491#issuecomment-1173480360
Right, this situation will become untenable at some point. A suggestion would be to start adding a unique object index to the structural variants, and possibly add an extra step of detailed checking on fails against uniqueness to the current _id where we look also at the other fields.
So we remember it: for the overlap SNV-SV, @northwestwitch had the idea to add additional callers to the first variant, if the variants check out to be similar enough. This would be excellent there, but of course not solve the multiple-differerent-endpoint-bnds issue.
I currently do not understand though why we wouldn't get unique ids from the example above, although I know it can happen with some callers. The (ALT, REF) field pairs all look unique?
I currently do not understand though why we wouldn't get unique ids from the example above though, although I know it can happen with some callers. The (ALT, REF) field pairs all look unique?
Unless we just use the chrom - start positions in the parsing
I currently do not understand though why we wouldn't get unique ids from the example above though, although I know it can happen with some callers. The (ALT, REF) field pairs all look unique?
Unless we just use the chrom - start positions in the parsing
It should be this, also for SVs?! https://github.com/Clinical-Genomics/scout/blob/6086937b4cb0da38fe52234a813fd14996d79a7e/scout/parse/variant/variant.py#L70
Let us check the cyvcf2 parsing when time allows. We could either check some existing case, with some decent callers, and see how the alts look in the db. Or simply print debug while parsing a small test.
Without testing the variants at hand, at least in general parsing ALTs via cyvcf2 into db seems ok - see the alternative
for this BND:
For instance if you have a translocation from chrA:posA to chrB:posB and chrC:posC, only translocation from chrA:posA to chrB:posB is loaded as a variant
Of course this happens more often in cancer cases than RD cases