The-Sequence-Ontology / SO-Ontologies

Collect of SO Ontologies
Creative Commons Attribution 4.0 International
94 stars 37 forks source link

copy number assessment subtree proposal #568

Open mbaudis opened 2 years ago

mbaudis commented 2 years ago

What is this request referring to? Result of genomic copy number assessment of a genomic element or region

What is the name you would like SO to give the term? copy number assessment and child terms

id: SO:nnnn01
label: copy number assessment
  |
  |-id: SO:nnnn02
  | label: regional base ploidy
  |   |
  |   |-id: SO:nnnn04
  |     label: copy-neutral loss of heterozygosity
  |
  |-id: SO:nnnn03
    label: copy number variation
      |
      |-id: SO:nnnn05
      | label: copy number loss
      |   |
      |   |-id: SO:nnnn07
      |   | label: low-level copy number loss
      |   |
      |   |-id: SO:nnnn08
      |     label: complete genomic deletion
      |
      |-id: SO:nnnn06
        label: copy number gain
          |
          |-id: SO:nnnn09
          | label: low-level copy number gain
          |
          |-id: SO:nnnn10
             label: high-level copy number gain
             note: commonly but not consistently used for >=5 copies on a bi-allelic genome region
              |
              |-id: SO:nnnn11
                 label: focal genome amplification
                 note: >-
                   commonly used for localized multi-copy genome amplification events where the
                   region does not extend >3Mb (varying 1-5Mb) and may exist in a large number of
                   copies

What is the definition that you would like for this term? Assessment of the copy number of a genomic feature or region, referenced to the expected allele count in the given sample. Examples of an expected count would ne:

Synonyms The root term would be equal to "CNV assessment" or CNV evaluation"; details for the child terms will be added while developing this proposal.

Parent Term sequence_comparison (SO: 0002072)

This seems to be the most fitting term but suggestions welcome.

Relevant Publications During the development of GA4GH Beacon v2 structural query documentation we found a lack of a consistent representation of CNV events and incomplete overlap between the concepts used in the "CNV community" (rare diseases and cancer) and SO representation. Adding @dsalgado, @ahwagner and @babisingh to the conversation.


This proposal relates to the need for the GA4GH VRS standard - but also in general for clarity about reporting CNVs - to have a documented set of terms to refer to. Note here https://github.com/ga4gh/vrs/issues/277


Updated on 2022-01-14 w/ some re-wording and addition of focal genome amplification

ahwagner commented 2 years ago

One thing I would add to this proposal is a clear definition of what constitutes low-level gain vs amplification. I have heard amplification loosely defined as >=8 allele copies in a diploid genome. I do not have any strong preference as to what this cutoff is, only that it is clearly specified in the definition. We should seek to align with definitions from a prominent authority.

For "homozygous deletion" entry perhaps we generalize this to "complete CN loss" or similar? Homozygous as a term is strongly tied to diploid genetics.

mbaudis commented 2 years ago

@ahwagner Great comments; supporting the "high level" statement with some literature/references is an obvious need (as are some other definitions - I just wanted to provide a draft for discussions...); and I agree w/ the complete >> homozygous (had the same feeling but didn't follow up -> waiting for voices :-)

hangjiaz commented 2 years ago

There are different cut-off values in terms of amplification (which also makes me confused):

amplification:

average genome ploidy <= 2.7 AND total copy number >= 5

OR average genome ploidy > 2.7 AND total copy number >= 9

amplification: >8 copies

amplification: >5 copies

amplification: >=5 copies

mbaudis commented 2 years ago

Pinging @egchr ...

mbaudis commented 2 years ago

@hangjiaz @ahwagner So this is rather consistent for a CN >= 5 on ploidy of ~2, w/ sometimes higher values used w/o defined baseline. However, I would just provide this as a reference, not as a prescription.

mbaudis commented 2 years ago

I have made some changes; pls. see the updated tree ...

mbaudis commented 1 year ago

The new tree is now reflected in EFO, including the the high-level copy number loss class added during GA4GH VRS 1.3. alignment.

https://www.ebi.ac.uk/ols4/ontologies/efo/classes/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0030063?lang=en