The-Sequence-Ontology / SO-Ontologies

Collect of SO Ontologies
Creative Commons Attribution 4.0 International
94 stars 37 forks source link

Add terms for INSDC /regulatory_classes that lack an SO match #379

Closed catherinefarrell closed 6 years ago

catherinefarrell commented 7 years ago

Hi SO,

Some of the INSDC /regulatory_class terms currently lack a match to an SO term. Since we'd like to map all terms to SO (see issue #378 for more background), can you consider making an SO entry for the following?

1) enhancer_blocking_element

This is related to the SO entries for insulator (SO:0000627) and boundary_element (SO:0002020). The SO definition for SO:0000627 actually fits the description for enhancer_blocking_element, but the organization of these related terms needs modification. Unfortunately the literature has been using the term 'insulator' to mean multiple different biological functions, yet we need to be more specific for annotation purposes. The major functions of an insulator are: 1) barrier or boundary function (as in the SO:0002020 description); and 2) enhancer-blocking function (as in the SO:0000627 description). PMID:12154228 provides a good overview, with the major functions shown in Figure 1. Other reviews that differentiate between these basic properties include PMIDs 22326678, 22265227 and 12154228. Insulator biological functions are even more nuanced than the above two properties, and nowadays insulators are more generally considered to be elements involved in higher-order chromatin organization, e.g., PMIDs 25781057, 24277632, 23706817 or 22265227.

So my suggestion would be to make parent:child terms, where insulator (SO:0000627) is the parent (with a modified definition to be more inclusive of all functions). The term boundary_element (SO:0002020) and a new SO entry for enhancer_blocking_element would be children. The alias insulator should be included in all these related entries since it's used in multiple ways in the literature.

INSDC already has a /regulatory_class for insulator, as follows: Qualifier: /regulatory_class="insulator" Definition: a chromatin boundary element or barrier that can block the encroachment of condensed chromatin from an adjacent region. May also include enhancer-blocking activity.

NCBI is currently mapping this to SO:0000627 (same term), even though the SO definition is different. Depending on how you reorganize the SO terms, SO:0002020 may be a more appropriate mapping based on the INSDC definition.

And the INSDC term for enhancer_blocking_element is: Qualifier: /regulatory_class="enhancer_blocking_element" Definition: a transcriptional cis regulatory region that when located between an enhancer and a gene's promoter prevents the enhancer from modulating the expression of the gene. Sometimes referred to as an insulator but may not include the barrier function of an insulator.

2) imprinting_control_region

There are SO terms for types of imprinting (SO:0000134, SO:0000135, SO:0000136) or for imprinted genes (SO:0000888, SO:0000889), but there is no term for an imprinting control region. INSDC term:

Qualifier: /regulatory_class="imprinting_control_region" Definition: a regulatory region that controls epigenetic imprinting and affects the expression of target genes in an allele- or parent-of-origin-specific manner. Associated regulatory elements may include differentially methylated regions and non-coding RNAs.

Acronyms/aliases include ICR, imprinting center (IC), differentially methylated region (DMR) or differentially methylated domain (DMD). Relevant reviews include PMIDs 23287029, 18282507, 17467259 and 16575166. This would likely need to be a child of regulatory_region (SO:0005836).

3) response_element

There are SO terms for specific types of response elements (SO:0001839, SO:0001181, SO:0001853 or SO:0002045), but there is no generic term for response_element, which could be a parent to the specific terms. Here is the INSDC term:

Qualifier: /regulatory_class="response_element" Definition: a regulatory element that acts in response to a stimulus, usually via transcription factor binding

Thanks for considering these new terms! Please let me know if you need more information.

Catherine NCBI - RefSeq

keilbeck commented 7 years ago

These look good. I'll work on it this week.

catherinefarrell commented 6 years ago

Hi SO,

Just inquiring about the status of this issue and related issues #378 and #386. NCBI is planning a new human genome annotation, and we'd like to have all our feature mark-ups consistent with SO in our FTP files ( e.g., the SO_type column 3 classifications in our GFF3 files). This is important for user data mining, especially for RefSeq Functional Elements, which are already public and will be included in the upcoming annotation release.


Catherine NCBI - RefSeq

nicoleruiz commented 6 years ago

We are waiting for @murphyte to review the mappings before we add them to SO.

nicoleruiz commented 6 years ago

I have added the following terms: SO:0002190 enhancer_blocking_element SO:0002191 imprinting_control_region SO:0002205 response_element

I also changed the definition of SO:0000627 insulator to include both functions and moved SO:0002020 boundary_element to be a child of insulator.