geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
220 stars 40 forks source link

GO:0009986 Cell surface: improve definition #14909

Open pgaudet opened 6 years ago

pgaudet commented 6 years ago

Stemming from the discussion in https://github.com/geneontology/go-annotation/issues/1621 @ValWood kindly offered to propose an improved definition for 'cell surface'. A suggestion is to rename to 'external of the cell'.

ValWood commented 6 years ago

Currently, cell surface is defined The external part of the cell wall and/or plasma membrane.

This comment was meant to clarify Note that this term is intended to annotate gene products that are attached (integrated or loosely bound) to the plasma membrane or cell wall.

I believed what we were trying to capture with this term is "gene products which are externally localized" Others may disagree, but at present there are many issues with the term, its ancestors, children and annotations!

It is useful to be able to distinguish the set of proteins which exert their function outside of the cell (for yeast this would include i) flocculins and other cell surface recognition molecules, secreted enzymes involved in nutrient aquisition (amylases, proteases). For vertebrates it would include analogous proteins. A general grouping term is required because we do not always know if these are attached to the plasma membrane directly or indirectly (or for yeast the cell wall), but we know with certainty that they act outside the cell (annotations are often ISS, IC, TAS based on family etc).

The children of the cell surface term are

anchored component of external side of plasma membrane The component of the plasma membrane consisting of the gene products that are tethered to the external side of the membrane only by a covalently attached anchor, such as a lipid group embedded in the membrane. Gene products with peptide sequences that are embedded in the membrane are excluded from this grouping.

external side of plasma membrane The leaflet of the plasma membrane that faces away from the cytoplasm and any proteins embedded or anchored in it or attached to its surface.

intrinsic component of external side of plasma membrane The component of a plasma membrane consisting of gene products and protein complexes that penetrate the external side of the plasma membrane only, either directly or via some covalently attached hydrophobic anchor.

GO:0070263 - external side of fungal-type cell wall The side of the fungal-type cell wall that is opposite to the side that faces the cell and its contents. (although I think that all cell wall should have this parentage)

Which fits with my assumption.

@krchristie @RLovering @addiehl @hattrill @ukemi

Does this sound reasonable?

I will document some issues I spotted below.

ValWood commented 6 years ago

So it would end up as something along the lines of: Rename "cell surface" to "external to cell" (or obsolete and create a new term?)

Define as

The external part of the cell. This includes the cell wall components (plants, fungi, bacteria), and gene products which are associated with the external side of the plasma membrane via a lipid group, or tethering to an intrinsic gene product.

ValWood commented 6 years ago

This might be over simplistic.

hattrill commented 6 years ago

-I agree with the way you've described "cell surface" - this fits with how we would use it, although would probably use one the the more specific child terms. -I would rather go the review annotations route than obsoletion. -I am fine with the name as it is, you'd probably struggle to get something equally as concise to convey the meaning. -'External to cell' is too like extracellular for me.

pgaudet commented 6 years ago

I'm tagging this 'editors discussion' to have feedback from that group.

Pascale

RLovering commented 6 years ago

I agree with Helen. Changing the name cell surface to 'external to cell' is likely to lead to more confusions as it will be hard to recognise the difference between the term name external to cell' and GO:0005576 extracellular region - definition: The space external to the outermost structure of a cell. For cells without external protective or external encapsulating structures this refers to space outside of the plasma membrane. This term covers the host cell environment outside an intracellular parasite. Comments Note that this term is intended to annotate gene products that are not attached to the cell surface. For gene products from multicellular organisms which are secreted from a cell but retained within the organism (i.e. released into the interstitial fluid or blood), consider the cellular component term 'extracellular space ; GO:0005615'.

Especially as some of the proteins annotated will end up being associated with both terms, eg receptor ligands.

I also have to apologise in advance because I did not appreciate that this term was not meant for proteins that do span the membrane. I had assumed that a protein could be annotated to multiple regions associated with the membrane, ie a transmembrane receptor would be annotated as: cell surface, integral component of membrane,

To be honest I think the definitions and mostly the comments for the other GO membrane terms have been supporting this approach: GO:0044214 spanning component of plasma membrane Definition | The component of the plasma membrane consisting of gene products and protein complexes that have some part that spans both leaflets of the membrane. Comment | Proteins that span the membrane but have the bulk on one side of the membrane may be additionally annotated with a term of the form integral to X side of the plasma membrane.

In that if the protein has an important function on both sides of the membrane then isn't it necessary to recognise this with the CC term? which is especially true of signaling receptors. And possibly some Noctua models will imply this? (or maybe not?)

Note the other term: GO:0019897 extrinsic component of plasma membrane Definition The component of a plasma membrane consisting of gene products and protein complexes that are loosely bound to one of its surfaces, but not integrated into the hydrophobic region. Doesn't the definition suggest this term should be a child of 'cell surface'?

In addition, I do agree that a problem comes from antibody staining when antibodies are used which do not penetrate the cell or when they recognise an epitope which is on the external side of the cell. Without taking into consideration the protein structure cell surface is the most appropriate annotation of the experimental data. I haven't yet commented on the related github ticket #1621 but I think the outcome of both discussions may also need to consider how FACs are annotated.

Ruth

RLovering commented 6 years ago

I also think Helens' comment following discussions with an expert github ticket #1621 is useful here:

Likewise currently for 'plasma membrane' versus 'cell surface' since the annotation currently doesn't really distinguish anything. Only a small subset of plasma membrane proteins that are present on the cell surface are annotated as such. However, if the 'cell surface' term were expanded (and used in conjunction with 'plasma membrane') it would be useful - to reflect proteins that are actually externally facing rather than PM proteins that have no / minimal ectodomain - to me this is the key distinction.”

RLovering commented 6 years ago

I have got confused about which ticket to put these comments on so I am putting it in both

I like Pascale's suggestion: WRT 'cell surface', I dont think this term is necessary. Annotations could probably be moved to 'plasma membrane' for animal-type cells. I am not sure what to suggest for other organisms, but it should be possible to use -cell periphery -- cell wall -- plasma membrane

Ruth-If we look at this then: The definition for cell periphery is: The part of a cell encompassing the cell cortex, the plasma membrane, and any external encapsulating structures. So at the very least 'cell surface' should be a child of 'cell periphery'

Then why not get rid of cell surface and instead use GO:0019897 extrinsic component of plasma membrane (or equivalent cell wall term) Definition The component of a plasma membrane consisting of gene products and protein complexes that are loosely bound to one of its surfaces, but not integrated into the hydrophobic region.

If there is no attachment to the plasma membrane or cell wall then the protein should be annotated with extracellular region.

Looking at the child terms for plasma membrane (especially GO:0019897 extrinsic component of plasma membrane) FACs analysis of animal cells would be annotated with plasma membrane rather cell surface with IC or IDA depending on the outcome of this discussion above to the more descriptive GO terms based on the protein structure.

ValWood commented 6 years ago

Then why not get rid of cell surface and instead use GO:0019897 extrinsic component of plasma membrane (or equivalent cell wall term)

For fungi, we need a way to group everything which is on the outside of the cell. We used cell surface for this because it has the comment:

Note: that this term is intended to annotate gene products that are attached (integrated or loosely bound) to the plasma membrane or cell wall.

Unfortunately all of "cell wall" is not under this term, and based on this comment it should be?

The reason we need a grouping term for "cell wall" and "external side of plasma membrane" is because we often know that a gene product is outside of the cell, but I am unsure if we always know whether it is attached to the cell wall, or the plasma membrane.

I'm happy to change but it would be useful to have a term for "external stuff"

I don't think we should use "extracellular region" for these (although we have!) because that is Definition The space external to the outermost structure of a cell. For cells without external protective or external encapsulating structures this refers to space outside of the plasma membrane

ValWood commented 6 years ago

So "external side of plasma membrane" works for metazoa, but fungi need a grouping term. "Cell surface" works for us. One solution would be to keep as is, but have the ability to make this term "not for direct annotation" for specific taxa (we can't do this at the moment though).

ValWood commented 6 years ago

So at the very least 'cell surface' should be a child of 'cell periphery'

I agree, I thought it was!

krchristie commented 6 years ago

It can be useful to annotate directly to 'external side of plasma membrane'. If you have a FACs and that is all you have, you wouldn't be able to annotate to anything more specific.

RLovering commented 6 years ago

Hi All

I have gone back to looking at the ontology for this domain because I think we need something for curators to use to actually create annotations in a consistent way.

I just think it is time we created a figure (like the one Val drew in #14948) which shows how each different type of entity associated with the membrane and cell wall should be annotated and then to have this available on the http://geneontology.org/page/go-annotation-conventions site. This page exists: http://geneontology.org/page/membrane-proteins but it does not appear to be listed on the http://geneontology.org/page/go-annotation-conventions, so I am not sure how people can get to this page. I found it when looking for a suitable figure in Rebecca's GOC presentation from 2016. While I like this figure and I think it is very helpful I don't think it actually states that if a protein is spanning the membrane then it should only be annotated to integral component of membrane ; GO:0016021, and not annotated to the following terms: cell surface, external side of plasma membrane, cytoplasmic side of plasma membrane. It would be useful to clarify whether a protein which is an integral component of membrane should also be annotated to these other 3 (or perhaps just the external and cytoplasmic side of plasma membrane) terms. And if this is how curators should be annotating plasma membrane terms shouldn't all terms such as GO:0008076 voltage-gated potassium channel complex have external and cytoplasmic side of plasma membrane terms as direct parent terms. This would also encourage curators to consistently capture this information. This figure also does not address the issue of the cell surface and cell wall. Furthermore, what this figure does illustrate is the problem that arises if people are not allowed to take into account general biological knowledge. If a protein is known to have a membrane domain and is identified by FACs using an antibody that does not penetrate the cell then curators will select the GO term 'cell surface' or 'external side of plasma membrane' because that is what the experiment shows. If curators are encouraged to consider biological knowledge (for example using InterPro data or protein structure) which predicts the protein is integral to a membrane then more accurate grouping of proteins/complexes can be achieved. Note InterPro often does not predict the specific membrane type and therefore the expt evidence with biological knowledge would lead to the creation of consistent annotations of all plasma membrane proteins to 'integral to plasma membrane' instead of the variety of annotations that currently exist for membrane spanning proteins. Looking at the plasma membrane terms available I think there are too many and many are hardly being used because they are almost impossible to experimentally demonstrate. On slide 10 I point out that: The comment in the term GO:0044214 spanning component of plasma membrane: ‘Proteins that span the membrane but have the bulk on one side of the membrane may be additionally annotated with a term of the form integral to X side of the plasma membrane’ means that a curator would be encouraged to annotate a receptor to integral to external side of the plasma membrane and thus annotate the receptor to cell surface. However, how is a curator supposed to know if a protein has it’s bulk on one side or the other? Biological knowledge? A more consistent approach would be that if the protein can be detected on one side or the other then we have to assume it is has some amount of it on that side. To have to make a judgement about whether this is sufficient to be considered its ‘bulk’ is just not feasible.

I have created a google document with slides which could be used to workout what we are trying to capture and what terms are needed. https://docs.google.com/presentation/d/1PjAp2pnEnPSiOPVUJAGovjUD0QnlUZv5tXprFmsZu9E/edit?usp=sharing Please look at the last slide where I suggest considering removing some of these plasma membrane terms.

This doesn't try to resolve the cell surface issue, but I was hoping that if we look at the ontology it might help us work out what we already have and how the terms are being used.

Ruth

ValWood commented 6 years ago

Would be useful, it's all a bit confusing...

hattrill commented 6 years ago

Thanks for pulling this altogether, @RLovering it makes it much easier to digest. I will look at this over next week and try to think of some demi-useful comments.

Based on my gut feeling, I think that the most important thing is that users see an unambiguous term - preferably one term that maps directly to one type of membrane association, but to achieve this utopia we need still to resolve the conflict about experimental observation vs inference. As we have so much more sequence-function knowledge now, perhaps we should unleash combinatorial evidence codes to allow us all to get past this constant sticking point (tagging @vanaukenk as an interested party).

RLovering commented 6 years ago

I appreciate that we need to address the conflict about experimental observation vs inference, however, I do think it would help if we actually look at the ontology to make sure it describes what we think it describes. Certainly at least if we agree on the ontology we can look at how to improve annotation consistency.

Although, I am still of the opinion that 90% of annotations are based on previous knowledge (ie inferred) and that to have 10 terms which we apply in annotations without using previous knowledge is going to lead down the path of changing all IDA annotation to 'IDA+other info annotations'. In which case why not just go the other way and use the new ECO codes that are available when no previous knowledge is used, no inference is used but the annotation is actually based completely on experimental data. For example apply the very specific ECO code for Electron Microscopy, assuming that you trust the antibody as specific for detecting your protein, which of course is another inference......

hattrill commented 6 years ago

I think that with your pictures we should be able to come up with a more limited palete of terms.

May be I am misinterpreting this, but my major concern is that we will end up with having to use multiple terms to describe one particular way of associating with the plasma membrane, particulary wrt proteins that span the membrane and possess intra and extracellular domains ending up with: GO:0009898 cytoplasmic side of plasma membrane GO:0009897 external side of plasma membrane GO:0005887 integral component of plasma membrane Which leads to inconsistencies and patchiness in annotation, which is confusing for users and curators.

There are a limited number of ways in which proteins associate with the plasma membrane and we should be able to map each one to a specific term. If the curator cannot tell which term/parent term, to use then they should fall back to “plasma membrane” rather than using a term that captures the limits of the assay.

ValWood commented 5 years ago

I am unassigning myself from this ticket because it is beyond what I can do as an individual. This might be a useful topic for an annotation call, or a GO meeting workshop activity?