Closed melissacline closed 4 years ago
I see what's happening... we didn't previously use the gDNA field so we let it combine values in a list. Now that we use it, this combined list fails the HGVS parser (for obv reasons). Working on a fix -- should be as easy as picking any values in the comma delimited list, but even better to ensure all three values return the same result when parsed.
After adjusting the script, I'm finding obviously disparate gDNA coordinates being grouped together... will require some more time to see what's going on and resolve.
Ok... here are some strange data coming to us in the download:
"chr17" "g.41258546A>G" "Unknown" "" "BRCA1" "BRCA1_001740" "NM_007294.3:c.139T>C" "pathogenic" "r.(?)" "p.(Cys47Arg)" "+/." "" "1" "Pascale Hilbert (Charleroi,BE)" "2014-12-21 10:06:24" "2019-02-08 16:32:34"
"chr17" "g.41258546A>G" "Unknown" "" "BRCA1" "BRCA1_001740" "NM_007294.3:c.139T>C" "VUS" "r.(?)" "p.(Cys47Arg)" "?/." "" "1" "Cindy Badoer (Brussels,BE)" "2014-12-21 10:06:24" "2019-02-08 16:32:34"
"chr17" "g.41258546A>G" "Unknown" "" "BRCA1" "BRCA1_001740" "NM_007294.3:c.139T>C" "VUS" "r.(?)" "p.(Cys47Arg)" "?/." "" "1" "Rien Blok (Maastricht,NL)" "2016-12-02 14:43:16" "2016-12-02 14:43:16"
"chr17" "g.41258546A>G" "Unknown" "" "BRCA1" "BRCA1_001740" "NM_007294.3:c.181T>G" "pathogenic" "r.(?)" "p.(Cys61Gly)" "+/+" "" "1" "Genevieve Michils (Leuven,BE)" "2014-12-21 10:06:24" "2016-08-05 14:13:49"
"chr17" "g.41258546A>G" "Unknown" "" "BRCA1" "BRCA1_001740" "NM_007294.3:c.181T>G" "pathogenic" "r.(?)" "p.(Cys61Gly)" "+/+" "" "1" "Genevieve Michils (Leuven,BE)" "2014-12-21 10:06:24" "2016-08-05 14:13:49"
"chr17" "g.32910403T>C" "Germline" "" "BRCA1" "BRCA1_001023" "NM_007294.3:c.1911T>C" "VUS" "r.(?)" "p.(=)" "?/." "" "1" "Rien Blok (Maastricht,NL)" "2017-12-24 09:53:15" "2019-02-08 16:32:34"
"chr17" "g.32910403T>C" "Germline" "" "BRCA1" "BRCA1_001023" "NM_007294.3:c.1911T>C" "VUS" "r.(?)" "p.(=)" "?/." "" "1" "Rien Blok (Maastricht,NL)" "2017-12-24 09:53:15" "2019-02-08 16:32:34"
"chr17" "g.41245637A>G" "Unknown" "" "BRCA1" "BRCA1_001023" "NM_007294.3:c.1911T>C" "VUS" "r.(?)" "p.(=)" "?/?" "" "1" "Genevieve Michils (Leuven,BE)" "2014-02-03 23:26:51" "2017-01-14 03:46:47"
"chr17" "g.41245637A>G" "Unknown" "" "BRCA1" "BRCA1_001023" "NM_007294.3:c.1911T>C" "VUS" "r.(?)" "p.(=)" "?/?" "" "1" "Arjen Mensenkamp (Nijmegen,NL)" "2014-02-03 23:26:51" "2017-01-14 01:06:31"
"chr17" "g.41245637A>G" "Unknown" "" "BRCA1" "BRCA1_001023" "NM_007294.3:c.1911T>C" "VUS" "r.(?)" "p.(=)" "?/?" "" "1" "Arjen Mensenkamp (Nijmegen,NL)" "2014-02-03 23:26:51" "2017-01-14 02:24:14"
"chr17" "g.41245637A>G" "Unknown" "" "BRCA1" "BRCA1_001023" "NM_007294.3:c.1911T>C" "VUS" "r.(?)" "p.(=)" "?/?" "" "1" "Genevieve Michils (Leuven,BE)" "2014-02-03 23:26:51" "2017-01-14 02:15:53"
"chr17" "g.41245637A>G" "Unknown" "" "BRCA1" "BRCA1_001023" "NM_007294.3:c.1911T>C" "VUS" "r.(?)" "p.(=)" "?/?" "" "1" "Marjolijn JL Ligtenberg (Nijmegen,NL)" "2014-02-03 23:26:51" "2017-01-14 03:25:23"
"chr17" "g.41245637A>G" "Unknown" "" "BRCA1" "BRCA1_001023" "NM_007294.3:c.1911T>C" "VUS" "r.(?)" "p.(=)" "?/?" "" "1" "Rien Blok (Maastricht,NL)" "2014-02-04 22:33:55" "2017-01-14 01:58:30"
"chr17" "g.41245637A>G" "Unknown" "" "BRCA1" "BRCA1_001023" "NM_007294.3:c.1911T>C" "VUS" "r.(?)" "p.=" "?/." "" "1" "Pascale Hilbert (Charleroi,BE)" "2014-12-21 10:06:24" "2019-02-08 16:32:34"
"chr17" "g.41245637A>G" "Unknown" "" "BRCA1" "BRCA1_001023" "NM_007294.3:c.1911T>C" "VUS" "r.(=)" "p.(=)" "?/?" "" "1" "Rien Blok (Maastricht,NL)" "2014-12-21 12:15:21" "2016-08-05 14:13:49"
"chr17" "g.41245637A>G" "Unknown" "" "BRCA1" "BRCA1_001023" "NM_007294.3:c.1911T>C" "benign" "r.(=)" "p.(Thr637=)" "-/." "" "1" "Quest Diagnostics (Madison,US)" "2014-12-17 19:07:22" "2016-08-05 14:13:49"
"chr17" "g.41245637A>G" "Unknown" "" "BRCA1" "BRCA1_001023" "NM_007294.3:c.1911T>C" "benign" "r.(?)" "p.(=)" "-/." "" "1" "Rien Blok (Maastricht,NL)" "2015-09-04 19:57:50" "2019-02-08 16:32:34"
"chr17" "g.41245637A>G" "Germline" "1/1900 cases" "BRCA1" "BRCA1_001023" "NM_007294.3:c.1911T>C" "VUS" "r.(?)" "p.(Thr637=)" "?/." "" "1" "Angela Solano & F Cardoso (Buenos Aires,AR)" "2017-07-21 18:07:10" "2017-09-09 16:52:57"
"chr17" "g.41245637A>G" "Germline" "" "BRCA1" "BRCA1_001023" "NM_007294.3:c.1911T>C" "VUS" "r.(?)" "p.(=)" "?/." "" "1" "Rien Blok (Maastricht,NL)" "2017-12-01 15:59:19" "2019-02-08 16:32:34"
"chr17" "g.41245637A>G" "Germline" "" "BRCA1" "BRCA1_001023" "NM_007294.3:c.1911T>C" "VUS" "r.(?)" "p.(=)" "?/." "" "1" "Rien Blok (Maastricht,NL)" "2017-12-01 15:59:19" "2019-02-08 16:32:34"
"chr17" "g.41245637A>G" "CLASSIFICATION record" "" "BRCA1" "BRCA1_001023" "NM_007294.3:c.1911T>C" "likely benign" "r.(?)" "p.(=)" "-?/." "VKGL data sharing initiative Nederland" "1" "VKGL-NL_Nijmegen (Nijmegen,NL)" "2018-01-15 20:58:59" "2020-03-23 16:13:27"
"chr17" "g.41245637A>G" "CLASSIFICATION record" "3 cases" "BRCA1" "BRCA1_001023" "NM_007294.3:c.1911T>C" "VUS (*)" "r.(?)" "p.(=)" "?/." "classified as class 3, 4 or 5 in 3/12850 full screen tests: PHE release 2 BRCA germline variants (June 19, 2019)" "1" "UK Variant Sharing Initiative - Clare Turnbull (England & Wales,GB)" "2019-12-20 14:40:01" "2019-12-20 14:40:01"
"chr13" "g.32910403T>C" "Unknown" "" "BRCA2" "BRCA2_001935" "NM_000059.3:c.1911T>C" "VUS" "r.(?)" "p.(=)" "?/?" "" "1" "Ans M.W. van den Ouweland (Rotterdam,NL)" "2014-02-04 22:33:55" "2018-03-30 16:37:02"
"chr13" "g.32910403T>C" "Germline" "" "BRCA2" "BRCA2_001935" "NM_000059.3:c.1911T>C" "VUS" "r.(?)" "p.(=)" "?/." "" "1" "Ans M.W. van den Ouweland (Rotterdam,NL)" "2017-12-01 17:36:46" "2019-02-08 15:17:09"
"chr13" "g.32910403T>C" "Germline" "" "BRCA2" "BRCA2_001935" "NM_000059.3:c.1911T>C" "VUS" "r.(?)" "p.(=)" "?/." "" "1" "Hans Gille (Amsterdam,NL)" "2017-12-01 14:11:44" "2019-02-08 15:17:09"
"chr13" "g.32910403T>C" "Germline" "" "BRCA2" "BRCA2_001935" "NM_000059.3:c.1911T>C" "VUS" "r.(?)" "p.(=)" "?/." "" "1" "Hans Gille (Amsterdam,NL)" "2017-12-01 14:11:44" "2019-02-08 15:17:09"
"chr13" "g.32910403T>C" "Germline" "" "BRCA2" "BRCA2_001935" "NM_000059.3:c.1911T>C" "VUS" "r.(?)" "p.(=)" "?/." "" "1" "Hans Gille (Amsterdam,NL)" "2017-12-01 14:11:44" "2019-02-08 15:17:09"
"chr13" "g.32910403T>C" "CLASSIFICATION record" "" "BRCA2" "BRCA2_001935" "NM_000059.3:c.1911T>C" "likely benign" "r.(?)" "p.(=)" "-?/." "VKGL data sharing initiative Nederland" "1" "VKGL-NL_VUmc (Amsterdam,NL)" "2018-01-15 20:58:59" "2019-12-04 15:24:38"
"chr13" "g.32910403T>C" "CLASSIFICATION record" "" "BRCA2" "BRCA2_001935" "NM_000059.3:c.1911T>C" "likely benign" "r.(?)" "p.(=)" "-?/." "VKGL data sharing initiative Nederland" "1" "VKGL-NL_Rotterdam (VKGL-NL_Rotterdam,NL)" "2018-01-15 20:58:59" "2019-12-04 15:24:38"
Both g.41197659G>C and g.41197658G>C are provided as gDNA values for NM_007294.3:c.*36C>G but HGVS does not find them to be equivalent
For NM_000059.3:c.4563A>G both g.32913055G>A and g.32913055A>G are provided, one must be incorrect.
For NM_000059.3:c.4621A>C both g.32912753A>C and g.32913113A>C gDNA values are provided and HGVS does not consider them equivalent.
For NM_000059.3:c.5351dup, both g.32913843dup, and g.32913844dup gDNA values are provided and HGVS does not consider them equivalent.
For NM_000059.3:c.7617+2T>G, both g.32930748T>G and g.32356611T>G values are provided and HGVS does not consider them equivalent.
@zfisch Would you like any help with this?
@melissacline sure! I just forwarded a link to this thread to Ivo as well. I'm not sure but it looked like maybe some of it was inconsistent data being reported to LOVD.
Here's the file where I'm pulling all the data from: BRCA.txt
Unfortunately, my validation script is a bit delayed, I had other development work to do, first. Otherwise, most of these errors would already have been automatically fixed.
- For gNDA g.41258546A>G we get different cDNA values:
(...) "chr17" "g.41258546A>G" "Unknown" "" "BRCA1" "BRCA1_001740" "NM_007294.3:c.181T>G" "pathogenic" "r.(?)" "p.(Cys61Gly)" "+/+" "" "1" "Genevieve Michils (Leuven,BE)" "2014-12-21 10:06:24" "2016-08-05 14:13:49" "chr17" "g.41258546A>G" "Unknown" "" "BRCA1" "BRCA1_001740" "NM_007294.3:c.181T>G" "pathogenic" "r.(?)" "p.(Cys61Gly)" "+/+" "" "1" "Genevieve Michils (Leuven,BE)" "2014-12-21 10:06:24" "2016-08-05 14:13:49"
My validation script has flagged the bottom two as a mismatch between genomic and transcript DNA. When my validation script is finished and has been run over the production LOVD, the curators can be notified to look at these entries.
- For c.1911T>C it seems some variants have gDNA and gene values mixed up (i.e. g.32910403T>C is provided as being on both chr17 and chr13):
"chr17" "g.32910403T>C" "Germline" "" "BRCA1" "BRCA1_001023" "NM_007294.3:c.1911T>C" "VUS" "r.(?)" "p.(=)" "?/." "" "1" "Rien Blok (Maastricht,NL)" "2017-12-24 09:53:15" "2019-02-08 16:32:34" "chr17" "g.32910403T>C" "Germline" "" "BRCA1" "BRCA1_001023" "NM_007294.3:c.1911T>C" "VUS" "r.(?)" "p.(=)" "?/." "" "1" "Rien Blok (Maastricht,NL)" "2017-12-24 09:53:15" "2019-02-08 16:32:34" (...)
Here, the top two are incorrect, but I can't tell if they picked the wrong gene/chromosome or the wrong position. I'll have to check with them. I have no reason to believe the others are wrong, too. Or do you think so?
- Both g.41197659G>C and g.41197658G>C are provided as gDNA values for NM_007294.3:c.*36C>G but HGVS does not find them to be equivalent
As g.41197658G>C
is not possible, my validation script autocorrected this to g.41197659G>C
. I have now also corrected this in LOVD.
- For NM_000059.3:c.4563A>G both g.32913055G>A and g.32913055A>G are provided, one must be incorrect.
Also got autocorrected; A>G is correct. I have fixed this in LOVD as well.
- For NM_000059.3:c.4621A>C both g.32912753A>C and g.32913113A>C gDNA values are provided and HGVS does not consider them equivalent.
Also got autocorrected, g.32913113A>C
is correct. I have fixed this in LOVD as well.
- For NM_000059.3:c.5351dup, both g.32913843dup, and g.32913844dup gDNA values are provided and HGVS does not consider them equivalent.
My script flagged this one as well, a curator needs to look at this.
- For NM_000059.3:c.7617+2T>G, both g.32930748T>G and g.32356611T>G values are provided and HGVS does not consider them equivalent.
My script autocorrected this to g.32930748T>G
. I have corrected this in LOVD as well.
Thanks @ifokkema!
We're regenerating a dataset for an upcoming release at the moment using the very latest LOVD data.
We're also excited to see how things smooth out once the new script is in place -- let us know if there's any downstream feedback we could provide to you once it's in place.
Ivo suggested that this problem variant, which comes from LOVD but gets spit out by the pipeline, is an artifact of our variant merging process. Let's doublecheck that. Looking at it, it appears to be a merging of NC_000017.10:g.41244533del and NC_000017.10:g.41244533delT, which are exactly the same thing with very slightly different syntax.