The-Sequence-Ontology / SO-Ontologies

Collect of SO Ontologies
Creative Commons Attribution 4.0 International
94 stars 37 forks source link

inconsistency: "TF_binding_site" or "DNA_motif" as parent [sf#209] #209

Open srynobio opened 9 years ago

srynobio commented 9 years ago

Reported by kchris on 2010-06-12 00:00 UTC Hi,

There are some promoter motifs, e.g. the "A_box" that is part_of "RNApol_III_promoter_type_2" that are is_a children of "TF_binding_site", which is a descendent of "protein_binding_site". Then there are promoter motifs, e.g. "DMv1_motif", that is part_of "RNApol_II_promoter" and is_a children of "DNA_motif".

* What is the basis for the difference in parentage?

After the meeting, it occurred to me that there might be transcription factors for transcription of viral RNA genomes, but I still think that I would like to see a term that specifically represents a TF binding site in DNA, even if there need to be other types of TF binding sites in RNA and maybe on proteins too.

-Karen C

srynobio commented 9 years ago

Response from @keilbeck

Hi Karen C, Just letting you know that I have not forgotten your requests. I will get to these when I get back from vacation on the 4th July. Sorry for the wait. --Karen E

srynobio commented 9 years ago

Hi Karen I think that the difference in the parentage of the parts of the promoter depended on whether I could find the TF that it binds to when I added it. If yes, it was a TFbinding site, if not a motif, with the hope that with more knowledge, the terms could move from motif to TFB. I have made some adjustments to binding site and siblings to delineate the kind of molecule they are on, and what they stick to. I am still working thru the review paper, so I am not closing this artifact yet, I just wanted to let you know the status.... --Karen

srynobio commented 9 years ago

Hi Karen I have created some new terms, and moved things around a little. I think that the best way to handle this was to have a subtree under DNA motif, for promoter elements: DNA_motif promoter_element core_promoter_element regulatory_promoter_element distal_pe proximal_pe This way all the elements are under one umbrella term, and if one element is part of more than one kind of promoter, it can be subtyped here. These terms then have individual part_of relations to the promoters that they are part of. I think that all-some holds for all of these. There are no longer separate motifs as kinds of TF_binding_site I did not want to create multiple parentage with these terms, so I made a very general statement that promoter_element overlaps TF_binding_site. I have not yet sub classified the bacterial elements as I had not seen a yeay or nay from Jim. We could move the four minus_x_signal terms to be core_promoter_elements if you think this is the right thing to do. Let me know if this works for you. --K

srynobio commented 9 years ago

Response from Karen Christie

Hi Karen, I have some further comments on defining "promoter" vs "core promoter". I hope I didn't steer you wrong when you, David, and I met. I think David and I had another conversation on how to define promoter afterwards that shaped what we decided to do for GO. Basically, your existing definition for "promoter", below: SO:0000167 - promoter Def: A regulatory_region composed of the TSS(s) and binding sites for TF_complexes of the basal transcription machinery. is exactly correct for bacteria. For eukaryotes, the situation is more complicated because some people use the same definition as the prokaryotic researchers use, while others use a much looser definition that includes additional regulatory elements, i.e. upstream activating sequence (UAS), transcription factor binding sites, etc. Basically, this means that there are two conflicting usages of "promoter", one that is precise, exactly the same as the def in SO, and which specifically excludes regulatory sequences beyond those which the RNAP itself binds to. The second usage is vague about exactly which elements are promoter elements and seems to encompass all proximal regulatory elements. The review I suggested uses the phrase "core promoter", which some people use to distinguish the basal sequence where the RNAP binds from other regulatory elements. Basically, I think that "core promoter" means exactly what the def of the existing SO term "promoter" means. Because of the ambiguity in the way researchers use the word "promoter" where some mean a very specific binding element, while others use the word loosely to mean proximal regulatory elements, David and I decided to avoid using the word "promoter" as a stand alone word, to avoid misannotation when researchers use the word loosely and annotators don't read carefully. So, what we chose to do for GO, is to use the phrase "core promoter" as a term that is identical in meaning to what the existing SO term "promoter" means. Then for all the rest of it, we have called it "transcription regulatory region". Since we don't encode the regions directly, but only the binding to them, we have terms like "transcription regulatory region DNA binding", and also "transcription regulatory region RNA binding" (to accomodate some transcription termination factors that bind to the nascent transcript). Thinking through the fact that what I've currently done lumps regulation of txn initiation with reg of txn termination, I'm now considering the idea of whether we'd want to break it down a bit and have terms like: transcription initiation regulatory region DNA binding transcription termination regulatory region DNA binding but I just thought of that, so I'll probably run the idea by David when we talk Monday am. But I wanted to let you know about my reticence to use the word "promoter" unqualified and the fact that for people who use the more precise def of "promoter", I think that most of the types of promoter elements you have indicated are oxymorons in the sense that, e.g. a distal regulatory element is by definition not part of the promoter. If you want to follow the way GO chose to handle this, all you would need to do is change most usage of the word "promoter" to "transcription regulatory region", or perhaps "transcription initiation regulatory region", which is probably a clearer way to group the elements that effect initiation vs other parts of the txn cycle that may be regulated. I hope that was clear. If not, we can set up a call if that would be better. -Karen