geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
32 stars 10 forks source link

"regulation of cell size" #1886

Open ValWood opened 6 years ago

ValWood commented 6 years ago

I was looking at the “unknowns” and thought it was odd to see these 'unknown' genes annotated to “regulation of cell size”

100.00% 48.00 YCR061W, YBR093C, YGR111W

I don't think we should annotate screens where size is affected to "regulation of cell size" .... lots of things affect cell size, because they compromize growth in some way, or affect a checkpoint and cells keep growing without division, but "cell size regulation" is a very specific are of study, and still not fully dissected (we know that this size is regulated by specific pathways involving TOR, through to cdc2 but the exact mechanisms are still obscure).

See for example the abstract of: https://www.ncbi.nlm.nih.gov/pubmed/28479325

most of these genes are just “affecting” cell size not regulating it.

https://www.yeastgenome.org/reference/S000070190

For comparison, we published here: A genome-wide resource of cell cycle and cell shape genes of fission yeast. Hayles J, Wood V, Jeffery L, Hoe KL, Kim DU, Park HO, Salas-Pino S, Heichinger C, Nurse P. Open Biol. 2013 May 22;3(5):130053. doi: 10.1098/rsob.130053. PMID: 23697806 ...that over 500 fission yeast genes affected cell size, but we would not make GO annotations from these, even with HTP????, (this is an equivalent analysis).

@srengel could you assign?

This is really a discovery exercise?…these should be phenotypes not GO annotations? cheers

v

srengel commented 6 years ago

hi Val, i've just now asked @robnash to take a look. if he can't get to it, we'll ask Suzi to review.

robnash commented 6 years ago

Hi Val,

Way back when I was a grad student I worked on genes in yeast, such as CLN3 (WHI1), and WHI3. CLN3 was originally isolated as a gene (WHI1) involved in regulating the critical cell size required for the traversal of START in the G1 phase of the cell cycle (ie.e couple growth with division). We know more about how some of these genes function to do this now. For example, CLN3 is a rate limiting activtor of CDC28 (G1 cyclin) and the syntheis of this protein is tethered to the accumulation of mass via growth rate. Before yeast can pass through START they need to accumulate mass until they achieve a critical cell size, as a cellular indicator that the enviroment is favorable and they can complete the cell cycle. Pombe does this in mitosis and I believe has a crytic control in G1 (budding yeast has a crytpic control in G2/M). WHI3 is now known to sequester CDC28 and associated cyclins in the cytoplasm and a trigger through TPK1 relieves this inhibition through phosphorylation. Not sure they know yet what WHI3 is monitoring. So for some of these such as CLN3, WHI3, WHI5 and MSA1, a molecular mechanism has been somewhat defined so I agree these annotations could be removed, and as necessary converted to phenotypes. For these I mentioned I can remove the manual annotations and chip away at any others as I have time and know there is a molecular mechanism. However, if these have HMP annotations it will be easier once these are in P2G so I will wait on these.

Generally speaking there are many mutations that make cells larger such as those that arrest the cell cycle but do not arrest growth so they accumulate mass. But mutations which result in small cell size especially those that affect critical cell size at START, potentially define regulators of START. So in cases where a gene has little known about what it is or does these annotaitons I would argue are useful. They provide a clue. This includes some genes you mentioned like YCR061W, YGR111W and others like YNL226W. So I would argue these annotations are useful as described in papers like this:

Jorgensen P, et al. (2002) Systematic identification of pathways that couple cell growth and division in yeast. Science 297(5580):395-400 PMID: 12089449

There are other genes where there is other information known about how they function such as: LGE1, RHO1 and PHO5 but in these cases I don't think it is known how they impact cell size but could be a clue that they are also involved in regulating this transition. This paper is an example of this situation:

Kikuchi Y, et al. (2007) Involvement of Rho-type GTPase in control of cell size in Saccharomyces cerevisiae. FEMS Yeast Res 7(4):569-78 PMID: 17302939

So I would argue these are also useful annotations, at least for now.

Hope this make sense and sorry for babbling on.

Cheers, Rob

ValWood commented 6 years ago

Hi Rob!

Hope this make sense and sorry for babbling on.

No worries! I'm babbling today... here goes....

So for some of these such as CLN3, WHI3, WHI5 and MSA1, a molecular mechanism has been somewhat defined so I agree these annotations could be removed, and as necessary converted to phenotypes.

I have no problem with any of these annotations as "regulation of growth" they are clearly acting directly in the upstream signalling pathways in some way, even if we can't be precise about how...

But mutations which result in small cell size especially those that affect critical cell size at START, potentially define regulators of START.

Agreed, these are very rare compared to large cells, and almost always informative (at least in fission yeast, if viable). One caveat is that quiescent cells tend to be smaller, so sometimes cells heading into quiescence have small size, (an argument against using screens for size regulation!).

So in cases where a gene has little known about what it is or does these annotations I would argue are useful. They provide a clue.

For small cell size, because the phenotype is more specific than large, I think it is safer generally safer. Large is pretty pleiotropic...

Kikuchi Y, et al. (2007) Involvement of Rho-type GTPase in control of cell size in Saccharomyces cerevisiae. FEMS Yeast Res 7(4):569-78 PMID: 17302939

to me this one looks ideal to make a cell size regulation annotation.

I think we are on the same page.... It's only the screen that seems a bit suspect for GO. If a phenotype is pleiotropic, it doesn't seem to be good practice to make an annotation. Its more a discovery exercise.

We probably need to discuss this further at a GOC meeting and improve the guidelines but many users now want lists of "unstudied proteins" (it is very topical in our community). Screens can really obscure truly uncharacterised proteins when there is really no functional data. Over 10% of fission yeast genes are large when mutated, only around 0.2% are considered to be truly directly involved in the regulation of cell size..... Annotating all of these gene products to "regulation of cell size" would be annoying for

a) those trying to dissect cell size pathways b) those trying to investigate unknown proteins c) those trying to perform realistic enrichments

so we need to manage that against the value of the predictions, if the predictions swamp the experimental annotation (which they are wont to do in many situations, like this one).

v

ValWood commented 6 years ago

"wont" I don't know where that came from, I'm channelling the middle ages...

robnash commented 6 years ago

So for some of these such as CLN3, WHI3, WHI5 and MSA1, a molecular mechanism has been somewhat defined so I agree these annotations could be removed, and as necessary converted to phenotypes.

I have no problem with any of these annotations as "regulation of growth" they are clearly in the upstream signalling pathways, even if we can't be precise about how...

I removed these and in some cases if appropriate replaced with GO:0007089 traversing start control point of mitotic cell cycle

But mutations which result in small cell size especially those that affect critical cell size at START, potentially define regulators of START.

Agreed, these are very rare compared to large cells, and almost always informative (at least in fission yeast, if viable). One caveat is that quiescent cells tend to be smaller, so sometimes cells heading into quiescence have small size, (an argument against using screens for size regulation!).

Yes, but typically these are log phase cultures so this should not be an issue. WHI2 is an example where the mutant is small only when in a saturated glucose culture.

We probably need to discuss this further at a GOC meeting and improve the guidelines but many users now want lists of "unstudied proteins" (it is very topical in our community). Screens can really obscure truly uncharacterised proteins when there is really no functional data. Over 10% of fission yeast genes are large when mutated, only around 0.2% are considered to be truly directly involved in the regulation of cell size..... Annotating all of these gene products to "regulation of cell size" would be annoying for a) those trying to dissect cell size pathways b) those trying to investigate unknown proteins c) those trying to perform realistic enrichments so we need to manage that against the value of the predictions, if the predictions swamp the experimental annotation (which they are wont to do in many situations, like this one).

_Yes, I agree that guidelines are good. Your example is valid and in cases like what you mentioned with size in pombe a curator would need to have this knowledge to be selective. I think the value in the case of an unknown, if properly vetted, might outway the points you mentioned above if the user trying to investigate or enrich is clever enough to know that because these are all HTP annotations such as in the case of the Jorgensen P, et al. (2002; PMID: 12089449) paper. This is the value of the HTP evidence codes and why we decided to display these on a separate part of the GO tab page:

https://www.yeastgenome.org/locus/S000000657/go#htp_

Cheers, Rob

ValWood commented 6 years ago

@hattrill @vanaukenk Could this be an example for the HTP guidelines discussions/evaluation?

(or are there already guidelines for this scenario?)

At PomBase we have internal guidelines (not documented but consistently applied), that we don't make GO phenotype annotation for a phenotype which is pleiotropic. In these cases we would require more information to assign a process. "cell size" is a good example. Others are "chromosome segregation" and "cytokinesis". Lots of mutants have these phenotypes even if not directly involved in a process (transcription, translation, splicing, nucleocytoplasmic transport etc ). Maybe we should have rules for this? At least some triangulation should be required to make a GO annotation from a pleiotropic phenotype?

pgaudet commented 6 years ago

@ValWood is this a HTP issue or a phenotype annotation guidleines issue ?

ValWood commented 6 years ago

I don't know, I think its both, more HTP in this case.....

robnash commented 6 years ago

I agree this is more of an HTP issue, as this is a less likely scenario in a paper with potential manual annotations. Although, I am torn becuase there have been times that we at SGD decided not to include HTP annotations for some cases such as telomere length assays based on the many processes which can impact length, I guess one of the things that troubles me is that we have on the one hand decided to create an evidence code HMP for use in these cases and o the other hand are deciding that this is not really valid. One could make this argument for many/most HTP phenotype screens. In the end when there are genes with unknowns, at least this HTP annotation provides a starting point for he researcher.

ValWood commented 6 years ago

Personally, I haven't yet come across a HTP paper of phenotypes that is suitable for GO (others may have examples).

This is because if we have a general rule (at PomBase) use a phenotype to make a GO annotation (small scale or large scale), that it needs to be clear that the phenotype is specific for the process. This usually also supported by the author intent, and triangulation with other available data that the phenotype is consistent with what we expect. But we rarely (if ever) make phenotypes to high level terms because the phenotypes tend to be pleiotropic. In these cases that the false positive rate would be far too high to make a GO annotation ( from small scale or large scale experiments). This applies to for example, cell size, your example above (telomere length), cytokinesis, chromosome segregation. For these processes the false positive rate would range from 70% (chromosome segregation) , to 99.8% for cell size. This is clearly too high. The starting point for the researcher is available as a phenotype annotation, it does not need to be in GO. Our users would not expect it, we got many strong steers from our community that is was not useful to classify possibly indirect observations as involved in GO processes when we didn't know, which is one reason why we developed FYPO.

Usually we require a whole slew of phenotype annotations to make a GO process for example this weeks papers, lots of complementary phenotypes -> one GO annotation. https://www.pombase.org/reference/PMID:29249658 https://www.pombase.org/reference/PMID:27582274 https://www.pombase.org/reference/PMID:28982178

so a single observation even in a small scale paper for us would be unlikely to materialize a GO process annotation.

However, I could imagine that HTP specific phenotype assays that were suitable for GO annotation might be available....I just can't bring any to mind (or for IGI for that matter).

hattrill commented 6 years ago

From the point of view of HTP guidelines, the curator should be able to make a (more or less) direct assertion based on the phenotype, mirroring the conventional annotation. i.e. a particular morphological defect or cellular phenotypic triat that can be mapped to a particular step (with HTP, some effort to whittle down the false positives)

So, for HTP, labelling "regulation of cell size" for a "I see bigger cells" would be incorrect and it should be captured by phenotype curation.

ValWood commented 6 years ago

It would be nice to have some examples of reasonable data to use for HMP here. I don't have any examples where I think the above applies either in that a) we would make a GO annotation from the equivalent LTP, or b) they would not have excessive number of FPs.... ....and we have quite a lot of phenotype screens now (29407 phenotype annotations from screens).

hattrill commented 6 years ago

I dumped many IMPs in the HTP review process, certainly wanted to keep this HMP: PMID:21750678 If I remember correctly: Nice control for removing 'cell growth defects' and good classification of phenotypes. Seems to sit well with LTP observations.

robnash commented 6 years ago

Hi,

There are some parts of this thread that I strongly disagree with regards to the specific HTP paper that started this exchange and more generally about the guidelines being considered for these HTP codes. Yes, by nature of the original HTP phenotypes in this paper one could argue that there are many different scenarios or reasons as to why these mutated genes result in this phenotype. And, yes this is a high level term, but I would counter that as such there is less of a change someone could be misled. Generally speaking the vast majority of phenotypic screens and genetic interactions could result from an alteration in the targeted process but could also be for other reasons or be non-specific and in the case of suppression or enhancement the result of nothing more than combinatorial effects. Allele specific effects are better but there are inherent caveats in all screens, and selections that involve mutants. I disagree that screens for chromosome segregation and cytokinesis mutants should not be curated with HTP GO, especially if nothing else is known about the gene. Its information in the absence of anything else, whether the term be considered high level or granular. As curators we are trained to examine evidence and make calls based on the data and our judegement based on our educational backgrounds. Certainly, in the case of HTPs a curator should be able to make the decision not to annotate characterized genes if the HTP annotation does not add anything (too high level) or just seems wrong but I see much less harm for a gene that is un- or under-characterized.

At SGD we decided many years ago to start curating GO using HTP codes based on the large number of high-throughput screens being generated in budding yeast. This happened after consortiums created genome-wide collections of null mutants and overexpression libraries. We needed a way to create these but alert users of their potential for not being as rigorously demonstrated as small-scale experiments (although keep mind that small scale phenotype and genetic interactions can also be inaccurate/wrongly interpreted). So we created a different table on our GO tab pages, clearly labelling annotations as HTP, and created this annotation type so it would be clear to our users. As consumers of data our users have the option to either include or exclude these from their analyses, whether it be using GO Term finder or GO Slim Mapper or other. We use a "Select by Annotation Method" option to either exclude or include HTP annotations in these analyses.

What is the utility of even considering the use of HMP or HGI if the guidelines are so tightly constrained as to make these codes no longer of any practical use? We may just as well decide to remove these evidence codes as options, if they can rarely ever be used. As I argued, for the vast majority of phenotypic or genetic screens one can envision scenarios (rationales) for what they could be incorrect. Whether a term is high up in the tree or not! In fact, even for manual annotations one could argue the same. However, GO is a work in progress with evidence codes used to reflect the strength of the result based on how the experiment was carried out. If there are unknown or under characterized genes why not include an HTP annotation. It can always be updated once better info is available and as I mentioned can be excluded if we educate them about these codes.

In the case of the particular cell size study that started this: Jorgensen et al., 2002 (PMID:12089449) the authors analyzed cell size distributions for the complete deletion collection. From this data we curated 443 phenotypes using variations of the experiment type "systematic mutation set", an indicator that the experiment used a systematic collection and a way to let user(s) know how these phenotypes were generated (HTP, robotics).

The, the authors took 25 of the small or "whi" mutants that were at or below the growth rate-cell size baseline (small phenotype not solely attributable to slow growth rate), and analyzed these further. Based on additional characterization, I could argue that these should be re-curated using manual evidence codes but at the time we decided to use HTP. These were NOT merely the result of an "I see bigger cells" experiment; they were "whi" or small mutants, with the potential of being key regulators of the START transition in the G1 phase of the cell cycle.

Several of these were genes previously known to be involved in cell cycle control such as: WHI3, CLN3, and CDH1. Further experiments suggested that SCH9 and SFP1 were dose-dependent regulators of START, and SFP1 was later shown in the paper to regulate transcription of RNAP II genes. Additional epistasis experiments suggeste that WHI5, CDH1, and SFP1 act upstream of SBF and MBF (transcription factor complexes, that act to promote passage through G1/S etc).

WHI3, WHI5 (named in this paper), and VPS51/WHI6 (alias named in this paper) were shown to be resistant to pheromone treatment, a START-related phenotype. WHI5, CDH1 and SFP1 all have a reduced critical cell size, operationally defined by the volume at which bud emergence occurs. This with other published data allowed them to conclude that they encode novel repressors of Start, likely acting upstream of the transcription factor complexes, SBF and MBF. Again, based on this information I am considering adding more specific manual annotations to these, perhaps replacing the HMP annotations.

So, I think we did a good job of separating the phenotype worthy genes, from the set of 25 presented in Table 1, that we felt warranted HMP annotations.

The analysis of these kinds of mutants during my graduate career, and in particular the two I studied in detail, ended up being very fruitful. CLN3 a gene involved in the coordination of growth with division, was one of the first G1 cyclins described and the other, WHI3, was a regulator of CDC28, the main cyclin dependent kinase (CDK) in budding yeast.

I think we really need to think carefully about how these guidelines are formulated, so that they a useful addition, rather than be so restrictive that they become useless!

My two cents, Rob

ValWood commented 6 years ago

@pgaudet we need to discuss this on a future QC call. Current practice makes inconsistent annotation if only genes with no information are annotated to "regulation of growth" "unknowns" are highly unlikely to be important for growth (I can elaborate, we have a paper on this to be submitted shortly).