Open tedgoldstein opened 8 years ago
We should store the number of variants per gene and also store the individual variants. For protein coding variants, they are commonly stored as
I'm not sure I understand the scope of the question, but everything Robert says sounds good to me. The variant nomenclature scheme at http://varnomen.hgvs.org/ is clunky but pretty good.
@tedgoldstein are the things in the first column of that table transcript labels? It doesn't seem to me that this has a compelling use case in our current roadmap.
Here is another bioinformatics issue for data sets. Handle multiple co-resident variants. For exmaple there are (at least) six variants of the APOBEC1 gene (which often uses the symbol A1CF). Most people have all of these variants.
NCBI label Hugo label NM_138933 A1CF NM_014576 A1CF NM_138932 A1CF NM_001198820 A1CF NM_001198818 A1CF NM_001198819 A1CF
Sometimes multiple variants need to be treated as separate genes, sometimes they should be averaged and treated as once gene. There are probably other strategies.
Rob and Holly should comment.