The-Sequence-Ontology / SO-Ontologies

Collect of SO Ontologies
Creative Commons Attribution 4.0 International
92 stars 37 forks source link

Describing genetic variations #586

Open ddooley opened 2 years ago

ddooley commented 2 years ago

We wanted to find out if SO is the place for a “genetic variation type” term or should it go in EDAM (under “genetic variation”) . There can be many “types” of genetic variations e.g. whole gene introductions/deletions, indels, cassettes (other mobile elements), SNVs. Our project (Genepio.org) needs an ontology class to encompass such types. We would start with 2 - gene present and SNV(s) present

If genetic variation types should go in SO, can SO introduce the EDAM “genetic variation” term, and the “genetic variation type” be a subclass in SO?

We also need genetic variation datums to describe the positions of the nucleotide or amino acid changes detected (this is usually done with reference to a known genome, or “reference genome”). The position and nature of the change is usually denoted using a notation system e.g. HGVS. Should the “nucleotide mutation symbol” (e.g. NC_000023.10:g.33038255C>A) and “amino acid mutation symbol” (e.g. LRG_199p1:p.Trp24Cys) also go in SO? These are shorthand notations for where the mutations occur in a sequence. Different notation systems could be used.

Thanks for advice!

c/o Emma @griffie

egchristensen commented 2 years ago

I do not believe that SO is the right place for the genetic variation datums you described. SO is primarily concerned with terms that describe pieces of knowledge associated with extents of biological sequence. SO does not capture specific data points associated with a particular database, genome build, ID system, etc. The terms and relationships found within SO can help to inform those efforts, however. While SO won't have nucleotide mutation symbol as a term, SO attempts to capture the different types of mutations that can occur to a given sequence of nucleotides or amino acids. Those terms and relationships would then enable another group to attach data/metadata to those concepts in a way that's consistent with SO and (hopefully) the rest of the sequence annotation community at-large.

Regarding gene present and SNV(s) present variations, it would seem that your terms might fit somewhere under sequence_variant (SO:0001060) but we'd need to do a bit of work to more precisely characterize and document your proposed terms before we could actually create new terms. There is a fairly good chance that there is an existing term that we could edit/update to serve your needs. Is this something you'd be willing to explore?