Closed reece closed 8 months ago
Original comment by Jerry Liu (Bitbucket: jerryliu2005, GitHub: Unknown):
This doesn't seem to handle different genome build. I recently saw a global variant id implementation in this BMC bioinfo. paper (unique for SNVs, deletions, and for insertions/MNVs of up to 2958 inserted nucleotides). Per the article it is unique for SNVs, deletions, and insertions/MNVs of up to 2958 inserted nucleotides. I don't see it is offered as open-source, though. I'm interested in implementing similar ID for our own in-house variant store and would like your input on this. Thx, Jerry
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.
Originally reported by: Reece Hart (Bitbucket: reece, GitHub: reece)
Finally close to being able to generate a computed (rather than assigned) variant id for uniqueness.
The idea is to use a hash of a serialized variant as a unique identifier. For example,
sha512("<sequence_sha512>:<start>:<end>:<alt_seq>")
would generate a hash that uniquely identifies a variant (absent hash collisions).Such hashes would be useful for:
When combined with normalization and other notions of "equivalent" or "representative" variants, this hash would provide a way to declare such relationships without the need for a central authority.