biocommons / hgvs

Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
https://hgvs.readthedocs.io/
Apache License 2.0
245 stars 94 forks source link

generate computed variant id #339

Closed reece closed 8 months ago

reece commented 8 years ago

Originally reported by: Reece Hart (Bitbucket: reece, GitHub: reece)


Finally close to being able to generate a computed (rather than assigned) variant id for uniqueness.

The idea is to use a hash of a serialized variant as a unique identifier. For example, sha512("<sequence_sha512>:<start>:<end>:<alt_seq>") would generate a hash that uniquely identifies a variant (absent hash collisions).

Such hashes would be useful for:

When combined with normalization and other notions of "equivalent" or "representative" variants, this hash would provide a way to declare such relationships without the need for a central authority.


reece commented 8 years ago

Original comment by Jerry Liu (Bitbucket: jerryliu2005, GitHub: Unknown):


This doesn't seem to handle different genome build. I recently saw a global variant id implementation in this BMC bioinfo. paper (unique for SNVs, deletions, and for insertions/MNVs of up to 2958 inserted nucleotides). Per the article it is unique for SNVs, deletions, and insertions/MNVs of up to 2958 inserted nucleotides. I don't see it is offered as open-source, though. I'm interested in implementing similar ID for our own in-house variant store and would like your input on this. Thx, Jerry

github-actions[bot] commented 8 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 8 months ago

This issue was closed because it has been stalled for 7 days with no activity.