geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
220 stars 40 forks source link

Incorporate new taxon constraints from PAINT #16298

Open cmungall opened 6 years ago

cmungall commented 6 years ago

[I advise holding off on this ticket for now but I need a place to make some initial notes. I will be editing this ticket as I go along]

I am with @dustine32 looking at integrating the TCs @thomaspd and Haiming made, I can make a PR with all of their TCs, but I want to discuss some of the more interesting/dubious ones here first.

I am using https://raw.githubusercontent.com/haimingt/GOTaxonRange/master/rawData/manualCurationList_longform.csv - for reporting purposes here I am enhancing this with the GO rdfs labels

epigenetic gene regulation

GO:0040029-regulation of gene expression, epigenetic Gain|NCBITaxon:2759(Eukaryota);

is this true? Of course, it depends on how we are using the term "epigenetic":

id: GO:0040029
name: regulation of gene expression, epigenetic
namespace: biological_process
def: "Any process that modulates the frequency, rate or extent of gene expression; the process is mitotically or meiotically heritable, or is stably self-propagated in the cytoplasm of a resting cell, and does not entail a change in DNA sequence." [PMID:10521337, PMID:11498582]

minor quibble - not a fan of using ; in the text def this way, I think genus-differentia style enhances clarity here. Anyway it seems that the differentia of 'epigenetic' is quite loose here and does not need to encompass state propagating across meiosis or mitosis.

If so, then it seems that this term encompasses bacterial "epigenetic" regulation, e.g: https://mmbr.asm.org/content/70/3/830.full

If so, we cannot interpret the TC as only-in

socially cooperative development

I think this is too restricted:

GO:0099120-socially cooperative development Gain|NCBITaxon:5782(Dictyostelium);

It may have been gained in dicty, but it is present in other taxa too, as the GO def mentions

Anterior-posterior

GO:0009948-anterior/posterior axis specification Gain|NCBITaxon:33213(Bilateria);

This is probably fine but it depends on how we define A/P.

Luckily someone write a nice summary of this here: https://jbiomedsem.biomedcentral.com/articles/10.1186/2041-1480-5-34

mesoderm

GO:0007498-mesoderm development Gain|NCBITaxon:33213(Bilateria);

in uberon we are more conservative - only in eumetazoa, never in poriferans.

in our notes: sponges do not seem to have a mesoderm and accordingly Amphimedon lacks transcription factors involved in mesoderm development (Fkh, Gsc, Twist, Snail) http://www.nature.com/nature/journal/v466/n7307/full/nature09201.html

I think it may be correct however that the true mesoderm is only found in bilateria

immune system

GO:0002376-immune system process Gain|NCBITaxon:3193(Embryophyta);NCBITaxon:6072(Eumetazoa);

Too restrictive? CRISPR?

cmungall commented 6 years ago

@jimhu-tamu would you consider bacteria to have epigenetic gene regulation (see paper above)

ValWood commented 6 years ago

Hi @chris. I think "immune system" is fine. Bacteria have "defense responses", but they don't have "immune systems, which is a multi-cellular organism phenomena (cell/tissue/organism). Even yeast don't have immune system, they have 'defense responses' .

pgaudet commented 6 years ago

Should we have a general 'defense response' as a parent of 'immune system process', that could have include 'defense response to other organism' as a child (and thus include bacterial defense responses to other organisms)?

Thanks, Pascale

ValWood commented 6 years ago

I think it should be the other way around.

The immune response is a type of defense response. The immune system is, in the stricter sense a system level, multi-organism process. I think it is more useful to keep it this way in GO rather than define it more loosely and allow bacterial annotation.

In fact, one part of the immune response (innate immune response) is already under "defense response", but the other part (adaptive immune response) isn't. Not sure why that is.....

cmungall commented 6 years ago

@ValWood - I think GO needs to clarify it's use of system.

id: GO:0002376
name: immune system process
namespace: biological_process
def: "Any process involved in the development or functioning of the immune system, an organismal system for calibrated responses to potential internal or invasive threats." [GOC:add, GOC:mtg_15nov05, GO_REF:0000022]
comment: Note that this term is a direct child of 'biological_process ; GO:0008150' because some immune system processes are types of cellular process (GO:0009987), whereas others are types of multicellular organism process (GO:0032501). This term was added by GO_REF:0000022.
xref: Wikipedia:Immune_system

There is nothing in this definition that excludes bacterial immune "systems". The word "organismal" may connote multi-cellular but single celled organisms are organisms.

The xreffed wikipedia page on immune system (https://en.wikipedia.org/wiki/Immune_system) says "Even simple unicellular organisms such as bacteria possess a rudimentary immune system in the form of enzymes that protect against bacteriophage infections. "

I think that it would in keeping with our underlying unstated assumptions we should

ValWood commented 6 years ago

Hmm, I was always taught that the immune system consisted of 2 parts, innate (evolved in metazoa) and adaptive (earlier in vertebrates).... although I suppose people who work on bacteria might want to call the defense response an 'immune system' people who work on fungi would never use this term.

addiehl commented 6 years ago

Some immune responses are defense responses, but not all. The response of regulatory T cells to suppress an immune response, which itself is a type of immune response, is not a defense response ("Reactions, triggered in response to the presence of a foreign body or the occurrence of an injury, which result in restriction of damage to the organism attacked or prevention/recovery from the infection caused by the attack"). Such regulatory immune responses are normal processes that occur to keep autoimmune responses under control or to regulate an immune response to a pathogen to prevent over activation of cell types such as neutrophils that can be quite destructive to host tissue.

Also, +1 for Chris' response above.

addiehl commented 6 years ago

Also, see the following two articles that discuss NLR proteins in fungi. NLR proteins are components of "immune systems" in other organisms, and are functioning in a similar role in fungi.

https://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1006578 http://science.sciencemag.org/content/354/6316/aaf6395.long

ValWood commented 6 years ago

OK, fungi do have self/non-self recognition (and response to non-self), so if this is enough to constitute "innate immune system'

In which case def would need to be revised: "GO:0045087
Innate immune responses are defense responses mediated by germline encoded components that directly recognize components of potential pathogens."

Or are there additional types of immune system other than innate and adaptive?

ukemi commented 6 years ago

If I recall correctly from when I was at Woods Hole, sponges are also able to recognize self from non-self. If you dis-aggregate two sponges into single cells, the cell will sort themselves out. PMID:9346930

addiehl commented 6 years ago

Not sure I understand the need to revise the definition of innate immune response. The NLR proteins in fungi and other species are germline encoded components that directly recognize components of potential pathogens, among other functions. Because innate immune responses rely on germline encoded receptors that that directly recognize components of potential pathogens, 'innate immune response' is_a 'defense response', unlike the more general 'immune response' term.

jimhu-tamu commented 6 years ago

@cmungall wrote

@jimhu-tamu would you consider bacteria to have epigenetic gene regulation (see paper above) I have had discussions with a former colleague about this, see: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4391566/

I argue that bacteria do have epigenetic gene regulation in the sense of heritable differences in expression that do not involve changes at the DNA sequence level. The methylation examples in the paper you linked are one case. There are some others that have been observed in non-evolved constructs where stuff has to be mutated: bistable switches in lambda and lac studied by Chrisophe Herman and others, for example. Note, however, that the GO definition uses mitosis or meiosis in the definition, which would exclude bacteria. As you might expect, I would change the definition rather than inherit the taxon restraint from that clause.

ukemi commented 6 years ago

Be careful with mesoderm: http://www.wormbook.org/chapters/www_gastrulation/gastrulation.html @vanaukenk any comments?

jimhu-tamu commented 6 years ago

@ukemi How do we define self vs non-self? Bacteria that make toxins and antibiotics have resistance/immunity proteins so that only those carriers survive.

@cmungall In terms of defense responses and CRISPR, it's often called an adaptive immune response, but I think of that as a loose analogy rather than an attempt to define equivalence. If CRISPR is an adaptive immune response, then a bunch of stuff becomes innate - restriction/modification, for example. I think it's important to capture CRISPR biology, and this year's CACAO might generate some term requests, but I don't think we want to force the analogy at this point.

Similarly, there are some cool papers talking about phage making a "nucleus" during infection. They even have tubulin homologs that attach to it. But I wouldn't call it a nucleus for GO. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6028189/

ukemi commented 6 years ago

@addiehl would probably provide a better definition that I would for self versus non-self, but I would say self is any part of an organism or the things that the organism produces. Non-self is anything else. Looking at the immunology definitions it looks like this idea was avoided and the definitions focused on potential threats or pathogens. I;m not sure why sponges do what they do, but I remember learning that it was similar to an immune response.

cmungall commented 6 years ago

@jimhu-tamu:

Note, however, that the GO definition uses mitosis or meiosis in the definition, which would exclude bacteria

The definition is IMHO confusing with it's disjunctions and conjunctions. I interpret as:

thus proks not ruled out

But I wonder if there is a better wording for the definition

ukemi commented 6 years ago

That's the way I interpreted it as well. I don't think it restricted the term to meiosis or mitosis.

addiehl commented 6 years ago

Self/Non-self discrimination in the adaptive immune system of vertebrates is quite a complex regulatory issue, as T cell receptors and B cell receptors are formed through somatic recombination and have a wide range of potential reactivities against a huge range of targets, including self-structures, typically proteins and carbohydrates made by the organism itself. This potential reactivity is managed at several levels through T and B cell development and peripheral activation pathways. T cells and B cells that are reactive against self antigens present in the thymus or bone marrow (or bursa in birds or kidney or liver for other non-mammals) are deleted via apoptosis, but many self antigens are not present in these organs, so other systems exist. In particular the context in which a dendritic cell presents antigen to a T cell determines how that T cell will react to the antigen, whether it will be activated to drive an immune response against that antigen, or become directly suppressed by the DC or differentiate to a regulatory T cell that suppresses immune responses of other T cells to that antigen.

One mechanism for providing this context depends on whether the DC cell itself has been activated by "danger" signals, activated for instance through TLR or NLR receptor recognition of pathogen-associated molecular patterns (PAMPS) or by inflammatory signals released from dying self cells. So there can be an indirect self/non-self recognition involved via PAMP recognition, but this is imprecise, and sometimes autoimmune responses to self antigens occur via accidental presentation of self antigens by activated DC cells. Thus to discriminate between self and non-self, the adaptive immune system relies on signals provided the innate immune system about potential threats or pathogens.

Self/non-self discrimination between individual members of the same species typically occurs through different mechanisms. In vertebrates, MHC molecules (which are responsible for presenting antigens to T cells) are highly polymorphic, and in tissue transplantation, foreign MHC molecules of the donor tissue are often "read" by T cells of the recipient as self-MHC + antigen (or vice versa) so that T cells get strongly activated directly to react against the foreign tissue.

Self/Non-self recognition in sponges most likely occurs via matched polymorphic receptor-ligand pairs, though I haven't had a chance to research this detail. The polymorphic DSCAM system in retinal cells works in a similar fashion to provide proper anatomical sorting of cells that is basically a self/non-self system, and DSCAMs are used in insects as part of an adaptive-like immune system as well.

I would like to remind the participants in this discussion that we had a well-attended GO immunology content meeting in 2005 that included a number of immunology experts (in addition to my own PhD and post-doctoral training in immunology) and the hierarchy of the GO immunology terms was worked out at that time. While improvements can be made in certain areas, people should understand that a lot of thoughtful work went into the existing terms and hierarchy and we should perhaps be careful about pushing for significant changes in the hierarchy without adequate input from actual immunologists.

cmungall commented 3 years ago

Some additional questions:

proteasome complex http://amigo.geneontology.org/amigo/term/GO:0000502 GO:0000502,Gain|NCBITaxon:1(archaea-eukaryota);

I'm curious why archaea-eukaryota is in parentheses. This is the root of the NCBI taxonomy which also includes bacteria. Was the intent to make multiple statements here? But why, isn't the proteasome complex found in all 3 lineages?

larval feeding behavior:

GO:0030536,Gain|NCBITaxon:6072(Eumetazoa);>Loss|NCBITaxon:32524(Amniota);

I think ultimately we'd want this in uberon for all larvae. But I think this is too restrictive as sponges have larvae and they consume food

cmungall commented 3 years ago

Note from Paul on how to interpret the syntax:

GO:0098610,Gain|NCBITaxon:1(root);NCBITaxon:4892(Saccharomycetales);NCBITaxon:4896(Schizosaccharomyces pombe);>Loss|NCBITaxon:451864(Dikarya);NCBITaxon:3193(Embryophyta);NCBITaxon:6072(Eumetazoa);

All the gains are listed first, then then all the losses, which is not how they would appear in a species tree, so it’s not intuitive, but it’s correct. This is the form for the evolutionary statement for any GO term that can apply only to unicellular organisms. The history of unicellular organisms is: Gained at the root (cellular organisms), then lost independently three times in the lineages of multicellular organisms (independent losses in dikarya, embryophyte and eumetazoa), then regained in lineages of ancestrally multicellular fungi/dikarya (independently gained in saccharomycetales and schizosaccharomyces).

ValWood commented 3 years ago

Out of curiosity which term is (independently gained in saccharomycetales and schizosaccharomyces). (I don't know of anything that fits this pattern)

balhoff commented 3 years ago

@ValWood looks like this one is GO:0098610 'adhesion between unicellular organisms'.

ValWood commented 3 years ago

Right it's an odd case. I would not think of the process of adhesion as being 'gained' but unicellular adhesion can only apply to unicellular species, semantically this is correct.

balhoff commented 3 years ago

@thomaspd here is a conflict in the new taxon constraints: 'renal system development' is "only in Vertebrata". However GO has 'Malpighian tubule development' as a subclass of 'renal tubule development' (which is part of 'renal system development').

cmungall commented 3 years ago

Renal system was discussed and decided here: #16143 (which itself followed from the work of the kidney WG)

The intent of GO is that "renal system" is very broad. Personally I am not a fan of such broad terms but at least we have put a do-not-annotate on it, so people will be annotating to more specific informative terms

I think this is worth revisiting as I think many GO development terms sit in an odd level of specificity, suggesting something vertebrate like but meaning something more fuzzy. But I don't think this is such a high priority. For development I think we should just inherit the uberon TC and focus on other processes.

UPDATE I am talking mostly about developmental processes here ^^^

Some related discussion here: https://github.com/obophenotype/uberon/issues/1450

pgaudet commented 3 years ago

@cmungall One precision: do you mean for any terms using Uberon, or just developmental processes?

balhoff commented 3 years ago

Here are some other problematic constraints in the new set. I gave one conflicting example for each, but these aren't the only conflicting terms.

GO term only in taxon Problem example Problem taxon Comment
'extracellular matrix' 'Metazoa or Dikarya or Embryophyta or Dictyostelium' 'bacterial biofilm matrix' Bacteria
'immune system process' 'Embryophyta or Eumetazoa' 'CRISPR-cas system' Bacteria
'plastid' 'Viridiplantae or Apicomplexa' 'chloroplast chromosome' never in Eukaryota Problem with 'cytoplasmic chromosome' def?
'pattern specification process' Bilateria 'cotyledon vascular tissue pattern formation' Viridiplantae
hemopoiesis Vertebrata 'larval lymph gland hemocyte differentiation' Arthropoda
pgaudet commented 3 years ago

Decision on ontology call

balhoff commented 3 years ago

Another problem constraint:

balhoff commented 1 year ago

Link to comparison spreadsheet computed a while back: https://docs.google.com/spreadsheets/d/11kBeOeeGJKW2-dP1G30i6VAoCHznG6_ociSjoY0hFQw/edit#gid=816831049