Open olingerc opened 2 years ago
Thanks Christophe! I brought this up with the team during this morning's stand-up meeting. We'll investigate how this is represented in the XML file so that we can provide useful haplotype information.
Hi @olingerc , Can you point me to a ClinVar record (RCV) that says 'Haplotype' or 'Genotype' in Measure?
It will be really helpful if you can describe a set of RCVs that are connected via this mechanism and which fields indicate the inter-relationship and how in more details. In short, I am asking for a description of your use case with real examples so that we better understand the feature you are requesting.
Thanks.
Hi @rajatshuvro,
An example variant would be: 1-171076966-G-A
Nirvana gives me the following ClinVar list (v3.18.1)
There are a total of 3 different (alleleSpecific) VCVs:
However, when opening the ClinVar pages of the two pathogenic variants: here and here it is obvious that they are only pathogenic in case they are coupled with another variant (Haplotype).
It would be very helpful if we had the "Haplotype" Info. It is stored in the MeasureSet
element.
<MeasureSet Type="Haplotype" ID="217371" Acc="VCV000217371" Version="1">
</MeasureSet>
(extracted from the full xml). If I read your code correctly you almost read the info already here
Here are all possible values:
<xs:simpleType name="Measuresettypelist">
<xs:restriction base="xs:string">
<xs:enumeration value="Gene"/>
<xs:enumeration value="Variant"/>
<xs:enumeration value="Haplotype"/>
<xs:enumeration value="Phase unknown"/>
<xs:enumeration value="Distinct chromosomes"/>
</xs:restriction>
</xs:simpleType>
A bonus would be having the info which other variant is in the haplotype. A quick fix would be extracting the title:
<ClinVarResult-Set>
<ClinVarSet ID="101183654">
<RecordStatus>current</RecordStatus>
<Title>
NM_006894.4(FMO3):c.[472G>A;560T>C] AND Trimethylaminuria
</Title>
<ReferenceClinVarAssertion ID="477812" DateLastUpdated="2022-06-24" DateCreated="2015-10-30">
...
within brackets, we see the identification of the second variant. Having the full list of variants would of course be nice as well, but I guess this would mean more changes to your code.
Thanks for considering the request!
Here is the corresponding line from a vcf file:
chr1 171076966 . G A 128.49 PASS AC=2;AF=0.333;AN=6;DP=116;FS=4.083;MQ=250;MQRankSum=6.805;QD=1.4;ReadPosRankSum=3.267;SOR=0.346 GT:AD:AF:DP:GQ:FT:F1R2:F2R1:PL:GP:PP:DN 0/0:28,0:0:24:63:PASS:.:.:0,63,945:.:0,74,260:. 0/1:13,16:0.552:29:48:PASS:7,8:6,8:85,0,49:50,6.9375e-05,52.227:128,0,54:. 0/1:33,30:0.476:63:48:PASS:14,12:19,18:84,0,50:49.643,6.8857e-05,53:84,0,124:Inherited
Thanks @olingerc . We are actively considering this a an upcoming feature.
Dear Nirvana team,
I'm sorry to mis-use the issue tracker for a feature request. I was not sure on how to best approach you.
Thanks for the detailed information on how you compile the ClinVar entries (HERE). Quite often we have the situation were we have many Clinvar entries on a position. Even reducing to isAlleleSepcific, it is sometimes difficult to get a good understanding on which entries are relevant to our variant. Specifically in the context of Clinvar entries that relate to variants at multiple sites (meaning they make only sense in case multiple variants are present = Haplotype). This information is stored in the
Measure
andGenotypeSet
Fields. Would it be possible to at least includeMeasure
? The example below from your documentation displays "single nucleotide variant" but we would be interested to identify cases for which this value would be "Haplotype" or "Genotype". Like this we could remove VCVs if they only make sense in case all variants are present.