geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
219 stars 40 forks source link

OTR:GO:0098009 virus terminase, large subunit #12446

Closed jimhu-tamu closed 8 years ago

jimhu-tamu commented 8 years ago

Should this be a component term? I'm not sure that it exists on its own independent of the terminase complex, in the sense that a large or small ribosomal subunit has independent existence in the cell.

Note that the activities attributed to the large subunit still need to be in GO.

paolaroncaglia commented 8 years ago

Hi @jimhu-tamu ,

I suspect that this term was requested as part of the viral terms mapping project (based on the dbxref PHI:etc). Brenley requested it. @rebeccafoulger , may I ask for your feedback here please? Or @dosumis ? Thanks.

jimhu-tamu commented 8 years ago

I also suspect that Brenley asked for it, and I probably agreed at the time. I think it was requested to capture the idea that there is a separation of functional roles for the subunits.

But now that we are actually doing phage CACAO and I am thinking harder about where these things belong in terms of molecular function/process/component, I'm not sure that this is the right solution. I am thinking that there should be molecular functions for the things the large subunit does (I think there are, and we can add or modify as we review that), and these should be in the process-function links for GO:0019073 viral DNA genome packaging. But the assignment of those functions to specific proteins should be a captured via annotation rather than in a component definition. Given the diversity of viruses and their rapid evolution, I would not be surprised at all if someone found a virus where the small subunit was larger than the one with the defined activities, and/or a case where it's fused into one protein or split differently by domain reorganization.

I confess that I haven't looked for editorial guidelines on what constitutes a component.

rebeccafoulger commented 8 years ago

Hi Jim and Paola,

I'm always a little suspicious when a GO term (terminase complex) only has one child (terminase large subunit). Is the small terminase subunit a multimeric protein complex too, or just a single copy of a protein? I think we either need both subcomplexes or neither.

We do have other complexes that are broken down to describe the different roles of the two subunits (e.g. retromer complex- not sure if the subcomplexes exist independently here).

If all annotations to the large subunit could be transferred to GO:0039631, I don't see a problem in removing/merging GO:0098009. (DNA translocase activity involved in viral DNA genome packaging ; GO:0039631)

Thanks.

paolaroncaglia commented 8 years ago

Thanks @rebeccafoulger and @jimhu-tamu . GO:0098009 virus terminase, large subunit has 4 IDA annotations (based on 4 different PMIDs), all UniProt and created 2 months ago. Possibly by SIB curators? I can’t check (@rebeccafoulger , could you take a quick look via P2GO please?). I’m equally fine with creating a new term for the small subunit (but I see Jim’s and Becky’s concerns so this may not be the most appropriate solution) or with merging/obsoleting GO:0098009 (but either way, we’d need to run this by the relevant UniProt annotator(s) if the existing annotations are to go to a different term than GO:0043493 viral terminase complex).

rebeccafoulger commented 8 years ago

Hello- yes, the 4 manual UniProt annotations (P03708, P17312, P24940, P54308) were done by Chantal (Hulo) and checked by Sylvain Poux at SIB.

paolaroncaglia commented 8 years ago

Thanks! Emailed Chantal and Sylvain.

SIBvirus commented 8 years ago

Hi Jim, Paola and Rebecca,

From what I have read so far, the phage terminase holoenzymes seem to be always composed of a small subunit (DNA-recognition component + regulation of the large subunit activities) and a large subunit (ATPase/motor + endonuclease activities). But I have not studied all phages! Both subunits assemble as heterooligomers forming ring-like structures that dock onto the portal vertex allowing the entry of viral DNA into the procapsid as well as its cleavage.

The current situation with one term for the terminase holoenzyme ("Viral terminase complex") and one term for the large subunit (but none for the small one...) is not logical.

I would propose that since the complex is composed of subunits/components with differents activities, it might be good to have two GO terms "Terminase, large subunit" and "Terminase, small subunit" in addition to "Viral terminase complex".

This solution allows to futher tag the enzymatic activities of the large subunit (ATPase, endonuclease) separately from the annotation of the small subunit.

I hope this might help, Chantal

paolaroncaglia commented 8 years ago

Thanks @SIBvirus. @rebeccafoulger and @jimhu-tamu , based on Chantal’s comment, I would propose the following edits:

1) For the large subunit: current stanza:

[Term] id: GO:0098009 name: virus terminase, large subunit namespace: cellular_component def: "The part of the viral terminase complex that contains the translocase activity. The large subunit typically comprises a pentameric protein complex." [GOC:bm, PHI:0000073] comment: This term should only be used when the large subunit consists of more than one polypeptide. subset: virus_checked is_a: GO:0043234 ! protein complex is_a: GO:0044423 {is_inferred="true"} ! virion part relationship: part_of GO:0043493 ! viral terminase complex

=> add relationship: capable_of GO:0039631 ! DNA translocase activity involved in viral DNA genome packaging

2) For the small subunit: create a new term:

GO:NEW virus terminase, small subunit def: “The part of the viral terminase complex that contains a DNA-recognition component and a regulatory component influencing the activity of the large subunit. The small subunit usually assembles as a heterooligomer.” comment: This term should only be used when the large subunit consists of more than one polypeptide. is_a: GO:0043234 ! protein complex relationship: part_of GO:0043493 ! viral terminase complex

=> could you suggest if the term should have links capable_of function x and/or capable_of_part_of process y? => @SIBvirus, could you suggest references for this new term please?

Or let me know if you disagree with this strategy, thanks.

jimhu-tamu commented 8 years ago

I hadn't noticed the comment before. Are there examples of that?

paolaroncaglia commented 8 years ago

Hi Chantal @SIBvirus (and @rebeccafoulger ),

Could you please address Jim's latest question above, and comment on my suggestion? Thanks in advance.

Paola

SIBvirus commented 8 years ago

Hi Paola,

The large subunit also displays the endonuclease activity and is not always pentameric cf. Uniprot entries e.g. UniProt entries TERL_LAMBD, TERL_BPSPP). The heterooligomeric complex is the one composed of the large and small subunit. The large and small subunit are themself often homooligomeric, but not always the case (cf TERL_LAMBD).

I would therefore suggest the following (cf. my comments/suggestions**):

[Term] id: GO:0098009 name: virus terminase, large subunit namespace: cellular_component def: "The part of the viral terminase complex that contains the translocase and endonuclease activities and allows the translocation of the phage DNA into the procapsid. The large subunit usually assembles as a heterooligomer with the small subunit." [GOC:bm, PHI:0000073] comment: This term should only be used when the terminase complex consists of more than one polypeptide. subset: virus_checked is_a: GO:0043234 ! protein complex is_a: GO:0044423 {is_inferred="true"} ! virion part relationship: part_of GO:0043493 ! viral terminase complex

=> add relationship: capable_of GO:0039631 ! DNA translocase activity involved in viral DNA genome packaging OK

**cf. also responsible for DNA cleavage => add: relationship: capable_of GO:0098035 ! The encapsulation of the viral DNA genome within the capsid, which proceeds via cleavage of the viral DNA at specific sites by a viral terminase.

2) For the small subunit: create a new term:

GO:NEW virus terminase, small subunit def: “The part of the viral terminase complex that acts as a phage DNA-recognition component and regulates the activity of the large subunit. The small subunit usually assembles as a heterooligomer with the large subunit.” comment: This term should only be used when the terminase complex consists of more than one polypeptide. is_a: GO:0043234 ! protein complex relationship: part_of GO:0043493 ! viral terminase complex

=> could you suggest if the term should have links capable_of function x and/or capable_of_part_of process y?

**I would add (...since the small subunit is needed for regulating the translocase activity and helping in phage DNA recognition):

relationship: capable_of_part_of process GO:0039631 ! DNA translocase activity involved in viral DNA genome packaging

relationship: capable_of_part_of process GO:0098035 ! The encapsulation of the viral DNA genome within the capsid, which proceeds via cleavage of the viral DNA at specific sites by a viral terminase.

Do not hesitate to tell me if you have further questions Chantal

NB. A quite complete review:

Annu Rev Genet. 2008;42:647-81. doi: 10.1146/annurev.genet.42.110807.091545. The bacteriophage DNA packaging motor. Rao VB, Feiss M.

SIBvirus commented 8 years ago

Hi Jim, You can find functional and structural annotation of some phage terminases (both subunits) in UniProt with linked references: http://www.uniprot.org/uniprot/?query=organism:phage%20name:terminase&fil=reviewed%3Ayes&sort=score I hope this may give you the information you are looking for. Chantal

paolaroncaglia commented 8 years ago

Thanks @SIBvirus . @jimhu-tamu , @rebeccafoulger , please let me know if you'd be ok with adding the small subunit term, and with editing the large subunit term as Chantal suggests. All - I'm thinking that "since the small subunit is needed for regulating the translocase activity and helping in phage DNA recognition", then the small subunit term should be capable_of_part_of REGULATION of DNA translocase activity involved in viral DNA genome packaging and capable_of_part_of REGULATION of viral DNA genome packaging via site-specific sequence recognition ? (The first regulation term doesn't exist, I'd create it via TG; the second regulation term already exists.) Please let me know if this is correct. Thanks, Paola

jimhu-tamu commented 8 years ago

The change in the comment suggested by @SIBvirus makes much more sense. As written it only works if the large subunit itself is split into two proteins.

Thinking more about having an independent term for each of the subunits, I can see doing it or not, but in some ways the small and large subunits are analogous to the sigma and core components of eubacterial RNA polymerase. But I'm not sure that we can look to the RNAP GO terms for guidance, insofar as I may have found a problem with those when looking for precedents.

Based on the above, there is going to be a small subunit term, I would modify the small subunit definition to include initiation of viral DNA genome packaging, and I'm not sure whether regulation is appropriate, unless we consider determination of the initiation site to be a form of regulation vs intrinsic to the packaging process. I may be thinking incorrectly about the capable_of_part_of relationship.

p.s. The link doesn't seem to work, but I looked in QuickGO for annotations to the large subunit term. I'm familiar with terminase biology; I've known one author of the review, Mike Feiss for many years. @dsiegele worked in his lab many years ago.

paolaroncaglia commented 8 years ago

Thanks all. To start with, I edited GO:0098009 virus terminase, large subunit as agreed. New term next.

paolaroncaglia commented 8 years ago

Note for self: There are problems with loading and committing the ontology at the moment.When they're fixed, I will create a new term as follows:

GO:NEW virus terminase, small subunit def: “The part of the viral terminase complex that acts as a phage DNA-recognition component and regulates the activity of the large subunit. The small subunit usually assembles as a heterooligomer with the large subunit.” dbxrefs: GOC:ch, GOC:jh2, PMID:18687036 comment: This term should only be used when the terminase complex consists of more than one polypeptide. is_a: GO:0043234 ! protein complex relationship: part_of GO:0043493 ! viral terminase complex relationship: capable_of GO:0039631 ! DNA translocase activity involved in viral DNA genome packaging relationship: capable_of_part_of GO:0098035 ! viral DNA genome packaging via site-specific sequence recognition

When that's done, I can close this ticket.

SIBvirus commented 8 years ago

Hi Paola, Just one thing regarding the small subunit: The DNA translocase activity belongs to the large subunit, so I think that "_capable_of_partof GO:0039631 ! DNA translocase activity involved in viral DNA genome packaging" would be more accurate for this new term? Chantal

paolaroncaglia commented 8 years ago

Thanks @SIBvirus . So, just to make sure I get this straight:

1) GO:0098009 virus terminase, large subunit already has the following relationship, I just added it: relationship: capable_of GO:0039631 ! DNA translocase activity involved in viral DNA genome packaging (note that GO:0039631 is a function term, so the rel. has to be 'capable_of', not 'capable_of_part_of)

2) GO:NEW virus terminase, small subunit will NOT have the link relationship: capable_of GO:0039631 ! DNA translocase activity involved in viral DNA genome packaging when I create it. I won't add that link.

Please let me know if I'm missing anything, and thanks again.

Paola

SIBvirus commented 8 years ago

On 27.05.2016 12:13, paolaroncaglia wrote:

Thanks @SIBvirus https://github.com/SIBvirus . So, just to make sure I get this straight:

1) GO:0098009 virus terminase, large subunit already has the following relationship, I just added it: relationship: capable_of GO:0039631 ! DNA translocase activity involved in viral DNA genome packaging (note that GO:0039631 is a function term, so the rel. has to be 'capable_of', not 'capable_of_part_of)

So we will end up with:

GO:0098009 virus terminase, large subunit relationship: capable_of GO:0039631 ! DNA translocase activity involved in viral DNA genome packaging relationship: capable_of GO:0098035 ! The encapsulation of the viral DNA genome within the capsid, which proceeds via cleavage of the viral DNA at specific sites by a viral terminase.

Is it right?

2) GO:NEW virus terminase, small subunit will NOT have the link relationship: capable_of GO:0039631 ! DNA translocase activity involved in viral DNA genome packaging when I create it. I won't add that link.

But I think it could have: capable_of_part_of GO:0039631 ! DNA translocase activity involved in viral DNA genome packaging

-> since it regulates the translocase activity and is involved binding to the DNA specific site (but not in its cleavage....)

GO:xxxxxxxxx virus terminase, small subunit relationship: capable_of_part_of GO:0039631 ! DNA translocase activity involved in viral DNA genome packaging relationship: capable_of_part_of GO:0098035 ! The encapsulation of the viral DNA genome within the capsid, which proceeds via cleavage of the viral DNA at specific sites by a viral terminase.

Thanks, Chantal

paolaroncaglia commented 8 years ago

Hi Chantal @SIBvirus ,

Here is the current stanza for the large subunit term:

[Term] id: GO:0098009 name: virus terminase, large subunit namespace: cellular_component def: "The part of the viral terminase complex that contains the translocase and endonuclease activities and allows the translocation of the phage DNA into the procapsid. The large subunit usually assembles as a heterooligomer with the small subunit." [GOC:bm, GOC:ch, GOC:jh2, PHI:0000073, PMID:18687036] comment: This term should only be used when the large subunit consists of more than one polypeptide. subset: virus_checked is_a: GO:0043234 ! protein complex is_a: GO:0044423 {is_inferred="true"} ! virion part relationship: capable_of GO:0039631 ! DNA translocase activity involved in viral DNA genome packaging relationship: capable_of_part_of GO:0098035 ! viral DNA genome packaging via site-specific sequence recognition relationship: part_of GO:0043493 ! viral terminase complex

Please note that GO:0098035 is a process term, so the relationship has to be capable_of_part_of, not capable_of (the inverse of the case in my previous comment). So it seems to me that the large subunit term already has a correct structure.

As for the small subunit term, which I still have to create, is it possible that the confusion stemmed from this previous comment of yours:

“Just one thing regarding the small subunit: The DNA translocase activity belongs to the large subunit, so I think that "capable_of_part_of GO:0039631 ! DNA translocase activity involved in viral DNA genome packaging" would be more accurate for this new term?”

Based on that I thought I should remove the process link from the small subunit, but maybe there was a mistake? If the small subunits regulates the translocase activity, I’d add the link capable_of_part_of GO:NEW regulation of DNA translocase activity involved in viral DNA genome packaging. @jimhu-tamu commented that regulation didn’t sound right to him, but I think he was referring only to the packaging term.

Thanks, Paola

SIBvirus commented 8 years ago

On 27.05.2016 14:24, paolaroncaglia wrote:

Hi Chantal @SIBvirus https://github.com/SIBvirus ,

Here is the current stanza for the large subunit term:

[Term] id: GO:0098009 name: virus terminase, large subunit namespace: cellular_component def: "The part of the viral terminase complex that contains the translocase and endonuclease activities and allows the translocation of the phage DNA into the procapsid. The large subunit usually assembles as a heterooligomer with the small subunit." [GOC:bm, GOC:ch, GOC:jh2, PHI:0000073, PMID:18687036] comment: This term should only be used when the large subunit consists of more than one polypeptide. subset: virus_checked is_a: GO:0043234 ! protein complex is_a: GO:0044423 {is_inferred="true"} ! virion part relationship: capable_of GO:0039631 ! DNA translocase activity involved in viral DNA genome packaging relationship: capable_of_part_of GO:0098035 ! viral DNA genome packaging via site-specific sequence recognition relationship: part_of GO:0043493 ! viral terminase complex

Please note that GO:0098035 is a process term, so the relationship has to be capable_of_part_of, not capable_of (the inverse of the case in my previous comment). So it seems to me that the large subunit term already has a correct structure.

OK, this makes sens. Perfect for me.

As for the small subunit term, which I still have to create, is it possible that the confusion stemmed from this previous comment of yours:

“Just one thing regarding the /small/ subunit: The DNA translocase activity belongs to the /large subunit/, so I think that "capable_of_part_of GO:0039631 ! DNA translocase activity involved in viral DNA genome packaging" would be more accurate for this new term?”

Based on that I thought I should remove the process link from the small subunit, but maybe there was a mistake?

No, I meant that we should keep the link since small subunit regulates the activity of the large subunit, but with a "capable_of_part_of" relationship.

If the small subunits regulates the translocase activity, I’d add the link capable_of_part_of GO:NEW regulation of DNA translocase activity involved in viral DNA genome packaging. @jimhu-tamu https://github.com/jimhu-tamu commented that regulation didn’t sound right to him, but I think he was referring only to the packaging term.

I completely agree with the capable_of_part_of GO:NEW regulation of DNA translocase activity involved in viral DNA genome packaging.

Thanks and have a nice week-end, Chantal

jimhu-tamu commented 8 years ago

Can one of you explain the capable_of_part_of relationship? Does this mean that the small subunit can be the target of other regulators, or that it is a regulator itself? I'm OK with the former, but not the latter.

dosumis commented 8 years ago

capable_of_part_of is used to relate cellular components (typically complexes) to processes where the component can perform some part of that process. It is more frequently inferred than asserted: A complex that is capable of catalysising some step in glycolysis is capable_of_part_of glycolysis.

Hth, David

On 27 May 2016, at 16:13, Jim Hu notifications@github.com wrote:

Can one of you explain the capable_of_part_of relationship? Does this mean that the small subunit can be the target of other regulators, or that it is a regulator itself? I'm OK with the former, but not the latter.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jimhu-tamu commented 8 years ago

Thanks @dosumis.

My problem with calling the small subunit a regulator is that it is a required component for the process under biologically relevant conditions. It's not something that modulates the frequency, rate or extent of viral DNA genome packaging. It initiates the process and determines where on the DNA initiation occurs.

In general, when I'm teaching CACAO, we tell students that the rule of thumb is that a component that a regulator changes how much you get, but is not an integral part of the process itself. There are cases where a protein could be annotated as both a regulator and an essential component, such as enzymes that are the targets of metabolic feedback regulation but I'm not sure how we've been handling those in GO and the terminase small subunit is not such a case as far as I know.

dosumis commented 8 years ago

Please note that GO:0098035 is a process term, so the relationship has to be capable_of_part_of, not capable_of (the inverse of the case in my previous comment).

This is a rule of a rule of thumb than a hard rule. If a cellular component can carry out a whole process then it is perfectly reasonable to use capable_of. If it can only carry out part of that process, then capable_of_part_of should be used.

Re - what counts as a regulator: activation is the maximal type of positive regulation (we have sub-relation directly_activates). So, if the role of the small subunit is to activate that large subunit, but not to participate in translocase activity, I think it is fine to say that it is capable_of 'regulation of DNA translocase activity'. However, if you think it participates in translocase process in some way, you could leave the assertion of function out of the ontology and annotate using 'contributes to'.

Hth, David

jimhu-tamu commented 8 years ago

still thinking about the regulation issue, but noticed a different problem. This is not true for terminase:

is_a: GO:0044423 {is_inferred="true"} ! virion part

paolaroncaglia commented 8 years ago

Hi all,

Thanks for your feedback. I’ve done the following:

[Term] id: GO:0097710 name: viral terminase, small subunit namespace: cellular_component def: "The part of the viral terminase complex that acts as a phage DNA-recognition component and regulates the activity of the large subunit. The small subunit usually assembles as a heterooligomer with the large subunit." [GOC:ch, GOC:jh2, PMID:18687036] comment: This term should only be used when the small subunit consists of more than one polypeptide. subset: virus_checked synonym: "virus terminase, large subunit" EXACT [] is_a: GO:0043234 ! protein complex relationship: part_of GO:0043493 ! viral terminase complex

I didn’t add any function or process link to the new term. If and when it is resolved if there should be any, please open a new ticket. I’ll close this one now :-)

Thanks, Paola