Closed colleenXu closed 3 years ago
@colleenXu what's the justification to switch to using ClinVar variant_id instead of RCV id? Is that the primary ID for Clinvar in Biolink model? Just need a description to document why we made this change.
I also suggest to remove dbnsfp.clinvar.clinvar_id
, clinvar
field should be the canonical source for ClinVar.
@newgene There are many identifiers in Clinvar, and it's not clear to me which ID the biolink-model is referring to with the CLINVAR prefix. See the id_prefixes here (wait...there's CLINVAR and ClinVarVariant? What are these?).
Based on my reading on Clinvar identifiers and this FAQ on RCV, it seemed that RCV were more like an association's ID (phenotype/disease + variant).
So I thought maybe a different ID from clinvar refers to the SequenceVariant itself and should be used...
According to the linked Clinvar doc on identifiers, the closest to a "unique variant ID" is maybe a "VariationID" or "AlleleID"?
OK, if this does not link to a known issue or expected use case, I would hold off this one.
We are making changes to ClinVar source at MyVariant.info now: https://github.com/biothings/myvariant.info/issues/127. Will reevaluate this after this myvariant issue is closed.
Based on the conversation in NCATSTranslator Slack, the CLINVAR ID prefix for SequenceVariant refers to the "Variation ID" in Clinvar.
My recommendation is to go forward with this PR, once I add a commit to remove the dbnsfp field mapping and resolve merge conflicts...
I've confirmed that the clinvar.variant_id field in MyVariant corresponds to the Clinvar Variation ID. For example, see how the clinvar.variant_id:596318 corresponds to the record here that states that the Clinvar Variation ID is also 596318
Pull Request Test Coverage Report for Build 1140665613
💛 - Coveralls