EBIvariation / CMAT

ClinVar Mapping and Annotation Toolkit
Apache License 2.0
18 stars 10 forks source link

Report all variants #122

Closed tskir closed 4 years ago

tskir commented 4 years ago

Reported by @AsierGonzalez via email on 2020-07-07. EVA-2097


As we have discussed a number of times, currently only the variants of certain clinical significance types (e.g. pathogenic and likely pathogenic) are included in the evidence strings. We would now like to request that all of them are included and we will then score them based on their clinical significance. We believe that expanding the range of clinical significance we cover may be useful for the users. The reference ticket for this item is https://github.com/opentargets/platform/issues/1139. Additionally, I would appreciate if you could point me to the ClinVar file(s) that contain the clinical significance information so that I can have a look into it to get a sense of what the values are and what their distribution is so that I can start working on the new scoring.


tskir commented 4 years ago

I couldn't agree more about including all variants. This can definitely be done.

Regarding the ClinVar files: we obtain most of the information, including clinical significance levels, from a single huge XML dump. However, I suggest that you don't start looking into it just yet. Right now I'm working on an issue #120, which is about describing all possible ways in which variants (including haplotypes, genotypes, and other complex cases) can be represented in ClinVar schema. This is issue was made necessary by our decision to swap RCV identifiers with VCV: when I started going through our code, I noticed it tries to handle some strange cases which I couldn't understand without revisiting the entire ClinVar data model.

What I'm saying is, given that I have the parser and the plotting already set up, I could easily extract any distribution of any value from the ClinVar data which you might require. So I'll do that and will report the results to you.

AsierGonzalez commented 4 years ago

Sure, I am happy to wait - not too keen to start fighting with a huge XML when you know it better