geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
34 stars 10 forks source link

SWT1 and transcription elongation #4537

Closed ValWood closed 1 year ago

ValWood commented 1 year ago

I am reviewing some function predictions, and have a function predictions for swt1 to transcription elongation.
There are a number of issues here:

Screenshot 2023-04-12 at 08 21 54

The predictions have some weighting for annotation numbers and so, becasue all physical and genetic interaction from a publication is curated .

  1. Should we have rules about adding multiple GIs from the same publication ?
  2. The main role of TREX Is in export so is there enough evidence that swt21 is involved in transcription elongation? Even the description does not hint at this "RNA endoribonuclease involved in perinuclear mRNP quality control; involved in perinuclear mRNP quality control via the turnover of aberrant, unprocessed pre-mRNAs; interacts with subunits of THO/TREX, TREX-2, and RNA polymerase II; contains a PIN (PilT N terminus) domain"

Even in the paper used to make the annotation the authors state SWT1 Does Not Show a Synthetic Lethal Interaction with Diverse Components of the Transcription Elongation Machinery......Although one cannot exclude that SWT1 interacts genetically with other alleles of DST1, RTF1, and SPT5 than the tested ones or with genes encoding other transcription elongation factors, the lack of genetic interactions with these three transcription elongation factors suggests that the genetic interaction between SWT1 and components of TREX is rather specific. This indicates that the function of Swt1 is closely related to TREX.

Even in the conclusion "Thus, Swt1 is necessary for high transcript levels of highly expressed genes. Alternatively to a direct function in transcription, Swt1 might be needed for a process downstream of transcription, e.g. for the recycling of TREX from the transcription machinery. Interestingly, Swt1 contains a PINc domain, and the amount of Swt1 in cells is quite low (data not shown; Swt1 was also not visualized and thus not quantified by Ghaemmaghami et al. (39)) indicating that Swt1 could have a catalytic function. Thus, Swt1 could also be involved in the turnover of transcripts, e.g. of aberrantly formed mRNPs. However, the PINc domain is not essential for Swt1 function in vivo as assessed by rescue of the synthetic lethal relationship with THO mutants by a Swt1 allele lacking the PINc domain. Alternatively to a catalytic function, Swt1 could be necessary for the transcription and also nuclear export of only a small subset of genes. Further studies will be needed to elucidate the molecular function of this new player in gene expression."

so. GO annotation to transcription elongation seems an over annotation, especially based on the later papers https://www.yeastgenome.org/reference/S000128898

@suzialeksander @srengel

ValWood commented 1 year ago

@pgaudet can we have a rule about IGI annotations, it should be one per annotation because it's effectively the same evidence/experiment for each interaction. Are there any guidelines about this?

srengel commented 1 year ago

a few issues here:

  1. annotation count is not an appropriate metric and should never be used for anything.

  2. the "annotations" in question use pipes, so they are 'OR', so they are a "single annotation", which can be easily seen in P2GO. (the display on SGD pages splits out the separate 'with' entries.)

  3. those annotations were originally made when the term was 'DNA-templated transcription', then years later the term was changed to the 'elongation' label. so that is a problem with all the term changes that have happened around transcription, in the 4 major overhauls of that area that have occurred since i have been annotating GO for the past 20 years.

all that said, i have removed the annotation, since its original meaning has been lost through subsequent ontology changes.

ValWood commented 1 year ago

annotation count is not an appropriate metric and should never be used for anything.

It isn't really being used as a metric here. The ML pipeline appears to give more weight to multiple annotations. This might not be a good thing, but it will be a common way to increase confidence for annotation transfer in function prediction pipelines. Anyway as you point out, in this case it would be a single annotation.

those annotations were originally made when the term was 'DNA-templated transcription'

That would be worrying, the meaning of a term should not change over time, but this term seems to have always been "elongation"