geneontology / paint

This curation tool allows curators to make precise assertions as to when functions were gained and lost during evolution and record the evidence (e.g. experimentally supported GO annotations and phylogenetic information including orthology) for those assertions.
Other
4 stars 4 forks source link

PAINT annotation of yeast genes to non-yeast terms #25

Closed monicacecilia closed 7 years ago

monicacecilia commented 8 years ago

Please note that this issue is mirrored on the GO-annotation repo at https://github.com/geneontology/go-annotation/issues/1424

@pgaudet, @selewis I'm adding it here as well to bring to your attention.

@robnash noticed PAINT propagated annotations to yeast genes at SGD for GO terms that appear to be specific for higher eukaryotes. He asked whether someone could please take a look:

GO:0016055 Wnt signaling pathway: HRR25 P29295 YCK1 P23291 YCK2 P23292 YCK3 P39962

GO:0016477 cell migration: STE20 Q03497 CLA4 P48562 SKM1 Q12469

GO:0007275 multicellular organism development: PMT1 P33775 PMT2 P31382 PMT3 P47190 PMT4 P46971 PMT5 P52867 PMT6 P42934 PMT7 Q06644

GO:0070884 regulation of calcineurin-NFAT signaling cascade RCN1 P36054

monicacecilia commented 8 years ago

@pgaudet, @selewis, for your information, more from @robnash:

Adding centrosome to the list, as yeast do not have a centrosome, rather a functional equivalent called the spindle pole body (GO:0005816)

GO:0005813 centrosome: KIN3 P22209 CDC14 Q00684 ESP1 Q03018 SPC97 P38863 SPC98 P53540

selewis commented 8 years ago

A variety of things going on here.

  1. There appear to be some issues with the taxon constraints. For RCN1 P36054 the annotation to GO:0070884 regulation of calcineurin-NFAT signaling cascade is valid. This is in PTHR10300. Someone needs to say whether or not it should pass the taxon checks or not. (and fix accordingly) Use this to test (Taxon 559292 is yeast) http://owlservices.berkeleybop.org/isClassApplicableForTaxon?format=txt&idstyle=obo&id=GO:0070884&taxid=NCBITaxon:559292

The same is true for GO:0016055 Wnt signaling pathway (PTHR11909). Taxon checks are showing this as valid, so the fix has to be there.

  1. One bug was noticed by Pascale (see https://github.com/geneontology/touchup/issues/51) and was fixed in PAINT-2.21 (the latest release). However there was an interval where people were using the buggy version. I'm going back and repairing (i.e. reloading and saving) these families. e.g. PTHR24361 perhaps. This is a monstrously big family. No annotations to cell migration here, but it is the family that Q12469 is in.

It is definitely the case for PTHR24362 for centrosome though. Just looks like bad timing.

  1. Discovered another bug where a loss all the way down at the leaf proteins wasn't being picked up. This explains the problems with "multicellular organism development" in PTHR10050. The fix will be in PAINT v2.22
selewis commented 8 years ago

A variety of things going on here.

  1. There appear to be some issues with the taxon constraints. For RCN1 P36054 the annotation to GO:0070884 regulation of calcineurin-NFAT signaling cascade is valid. This is in PTHR10300. Someone needs to say whether or not it should pass the taxon checks or not. (and fix accordingly) Use this to test (Taxon 559292 is yeast) http://owlservices.berkeleybop.org/isClassApplicableForTaxon?format=txt&idstyle=obo&id=GO:0070884&taxid=NCBITaxon:559292

The same is true for GO:0016055 Wnt signaling pathway (PTHR11909). Taxon checks are showing this as valid, so the fix has to be there.

  1. One bug was noticed by Pascale (see https://github.com/geneontology/touchup/issues/51) and was fixed in PAINT-2.21 (the latest release). However there was an interval where people were using the buggy version. I'm going back and repairing (i.e. reloading and saving) these families. e.g. PTHR24361 perhaps. This is a monstrously big family. No annotations to cell migration here, but it is the family that Q12469 is in.

It is definitely the case for PTHR24362 for centrosome though. Just looks like bad timing.

  1. Discovered another bug where a loss all the way down at the leaf proteins wasn't being picked up. This explains the problems with "multicellular organism development" in PTHR10050. The fix will be in PAINT v2.22
pgaudet commented 7 years ago

Hi Rob,

I did notice the centrosome issue before. I think the annotation should be replaced by the parent term GO:0005815 microtubule organizing center, the parent of both -- GO:0005813 centrosome -- GO:0005816 spindle pole body

Unfortunately we haven't been able to load large families in PAINT 2.0 (I am now working in PAINT2.22 with PTHR11566, a 500 members family, and every operation takes several minutes, so I am reluctant to load a 1000 members family.)

Not sure what to do - should I just manually replace the annotation in the GAF ??

Pascale

monicacecilia commented 7 years ago

@robnash - FYI comments and questions await your response in this issue. ~m.

selewis commented 7 years ago

@pgaudet Have you committed PTHR11566 yet?

I updated PTHR24362, the family with the centrosome problem.

Someone on the ontology (Val? @dosumis) needs to look at the conidiophore/multicellularorganism issue though. That is a annotation/ontology/taxon problem (or set of problems)

  1. Why ever annotate to Multicellular organism development?
  2. Do we need a multicellular structure development term? (conidiophore is currently a child of multicellular organism)
  3. Taxon issues?
cmungall commented 7 years ago
  1. we currently have 101 experimental annotations to this
  2. upper level groupings
    • we have anatomical structure development already as a grouping class. This is broader as it includes cells, but I'm not sure a multicellular structure development grouping class would be so useful
    • not sure what you mean by "conidiophore is currently a child of multicellular organism". Such an explicit relationship would be in the FAO. In GO, conidiophore development is (indirectly) part of multicellular organism development, which is correct (if not particularly interesting)
  3. I'm not sure which taxon issue you refer to - this one? https://github.com/geneontology/go-ontology/issues/12578

Here is the hierarchy in GO, with superclasses bolded:

con

selewis commented 7 years ago

@cmungall - Right now PAINT follows is_a, part_of, and regulates relationships, so that explains the inclusion of multicellular organism development as a 'parent' term. That is, the ontology is currently asserting that conidiophore development is part_of multicellular organism development - but that is not true biologically. For that matter, is a conidiophore an organ? Not sure what, or if, we've defined organ itself (i.e. in UBERON)

For a time PAINT was restricted just to is_a relationships, but that was too restrictive.

selewis commented 7 years ago

p.s. Not sure if there is a taxon issue.

The question was whether or not any organism that develops a multicellular structure (e.g. a conidiophore) can be deemed a multicellular organism. In which case, yeast (assuming it does sometime during its life under the right conditions forms conidiophores) is a "multicellular organism", even though most of the time it is single celled.

@ValWood

cmungall commented 7 years ago

See https://github.com/geneontology/go-ontology/issues/12578 for my thoughts on the utility of an organ grouping. The string "spore-bearing organ" does seem to be used by some fungal biologists, but that doesn't mean that 'organ development' in a useful grouping in GO.

Re: conidiophores. I thought that yeast lack conidiophores. But you're right the ontology is most likely baking in some dodgy assumptions here. Aren't there aseptate (single-celled) conidiophores?

This whole upper level part of the development hierarchy of the GO probably needs to be reviewed for fungi. I suspect that distinctions between colony, organism, multi-nucleate structure and multi-celled structure with degrees of septation get a bit weird and fun. I think we've already hit this with dicty. It might help if someone were to revive the FAO.

pgaudet commented 7 years ago

@selewis PTHR11566 was committed last week.

Would you please tell me how you updated PTHR24362 (the centrosome problem) ?

Thanks, Pascale

ValWood commented 7 years ago

I'm not sure where the taxon restrictions should be for conidiophores or whether they should be considered 'multicellular organism development"....my gut feeling is that they shouldn't.

Midori @mah and Antonia @antonialock @dianeoinglis thoughts?

(I am following this tracker already Suzi, but tag me if you want my input.)

selewis commented 7 years ago

@pgaudet - I simply loaded PTHR24362 into PAINT2.22 which has a fix to a bug. The bug was that the taxon check was not including the individual leaves, only ancestral nodes.

No changes to any annotations, except that now the yeast genes have a NOT.

selewis commented 7 years ago

@ValWood - we share a gut feeling. That is, conidiophore development should not be considered as part_of 'multicellular organism development". This would be a problem in the ontology, which is where we could use help as Chris suggested. Take a look at the graph he provided and see what you think.

selewis commented 7 years ago

@ValWood and everyone,

In the particular family under question regarding conidiophores - PTHR10050

Emericella nidulans proteins G5EB59, Q5BDC1 have IMP annotations to conidiophore development based on this paper: http://www.ncbi.nlm.nih.gov/pubmed/19666781?dopt=Abstract

Human protein Q9Y6A1 has a TAS annotation to multicellular organism development based on this paper http://www.ncbi.nlm.nih.gov/pubmed/10366449?dopt=Abstract

Presumably, since conidiophore development (GO:0070787) is part_of multicellular organism development (GO:0007275) someone generalized the assertion and annotated almost to the root of the entire family to multicellular organism development which leads to the NOT annotations for the yeast (with PAINT v2.22)

Even if the curator had annotated the fungal branches only to conidiophore development it still would have resulted in NOT annotations to the yeast since there is a taxon constraint on this. See: http://owlservices.berkeleybop.org/isClassApplicableForTaxon?format=txt&idstyle=obo&id=GO:0070787&taxid=NCBITaxon:559292

But to me what this has revealed is some peculiarities in the ontology itself. Regardless of yeast being a somewhat degenerative fungi.

dianeoinglis commented 7 years ago

I commented on calcineurin-NFAT https://github.com/geneontology/go-ontology/issues/12581

The issue of some fungal structures and processes being classified as a "multicellular organismal processes" AND "development" are both big problems. Conidiophores have both problems, mycelium development has both problems and is likely confused with mycelia that do not form reproductive structures (some do, some do not). Mycelium formation is not necessary a developmental process except when mycelia are progressing to form reproductive structures and only for fungal species that reproduce in the mycelial form.

That last sentence is typical of processes and structures in fungi. Some do, some do not, some do only when... and the terminology in the literature is terribly misused with the same term in one fungal species used by authors for a similar but distinct process in another.species.

From GO_Annotation #1424 "Conidophore development (GO:0070787) is a child term of multicellular organism development (GO:0007275). Conidophore is a fungal term (hypha development) for fruiting, spores, etc. Someone annotated the ancestral protein to multicellular organism development (which on the face of it seems somewhat useless, but regardless). Even if/when yeast do form conidophores, this is temporary structure. Is there a problem in the ontology? Should the parental term be multicellular structure development? Is there such a thing?"

One source of the problem is that fungi can be both unicellular and multicellular and the GO refers to filamentous fungi as a multi-cellular organism. Some fungi are strictly filamentous (multicellular), some are exclusively yeast (unicellular) and some are dimorphic and can switch between a unicellular yeast and a multicellular filamentous form. The S. cerevisiae curators do not annotate to "pseudohyphal growth" very often but this term causes S. cerevisiae to have a multicellular organism parent that curators may find unexpected but is true in the existing ontology.

"Development" is a term often used in fungal literature and sometimes it meets the definition in GO and sometimes is really a "formation" and not an actual developmental process. But because these uses vary from species to species, not even an intelligent reader with a PhD can figure this out unless they have specifically untangled these issues out for their research. "Hyphal growth" is another bad one if it used in filamentous fungi it means one thing and in a dimorphic fungus it means another. That term overlaps with the term filamentous growth. A mycelium is comprised of cells growing filamentously so these terms are related to mycelium development, conidiophore development and more. There are a lot of terms to be corrected and I have been accumulating notes but dragging my feet.

Bottom line, I think there should be a distinction between true multicellular organisms where a single cell of the organism cannot reproduce the whole organism as the individual fungal cells in a mycelium can if you break them up. An obligate multicellular organism is not equivalent in every way to a facultative multicellular organism with a unicellular option. If the fungal terms under the "multicellular organism/process" were made distinct in some way from true multicellular organisms, many problems would be solved. Perhaps it is just the definition of multicellular is too loose and attachment of the cells is not sufficient for the biology to represent all organisms that meet the definition.

selewis commented 7 years ago

+1 @dianeoinglis well said

ValWood commented 7 years ago

Sounds sensible to me Diane.

dianeoinglis commented 7 years ago

My research experience is in the area of filamentous growth and virulence of C. albicans, a dimorphic species and sporulation in Histoplasma, a true Dimorphic Fungus. When I curated at CGD and AspGD, my curation experience was on-the-job training. My scientific experience was the more important need at CGD and AspGD. "Filamentous growth" was the only term available at that time. Candida responds to a variety of distinct stimuli. I was chicken at that time and side-stepped the "multicellular" issue. The set of specific terms I requested were created under "population of unicellular organisms in response to x, y, z. etc. I prefer this classification over "multicellular organism." I recommend re-classifying the fungal terms as "population of unicellular organisms." The population of unicellulars is a scientifically more accurate for filamentous and dimorphic fungi and would solve many problems without requiring any group to recurate.

Thoughts and comments about this possibility?

A gene page at CGD for reference: http://www.candidagenome.org/cgi-bin/locus.pl?locus=EFG1&organism=C_albicans_SC5314

The reference for the publication describing the terms that were created for C. albicans: PMID: 23143685 "Improved gene ontology annotation for biofilm formation, filamentous growth, and phenotypic switching in Candida albicans." Diane O. Inglis, Marek S. Skrzypek, Martha B. Arnaud, Jonathan Binkley, Prachi Shah, Farrell Wymore and Gavin Sherlock http://ec.asm.org/content/12/1/101.long

selewis commented 7 years ago

Seems quite sensible to me as well.

Would you add this request to the ontology tracker please.

(and link back to this issue...)

On Wed, Aug 17, 2016 at 9:32 AM, Diane O Inglis notifications@github.com wrote:

My research experience is in the area of filamentous growth and virulence of C. albicans, a dimorphic species and sporulation in Histoplasma, a true Dimorphic Fungus. When I curated at CGD and AspGD, my curation experience was on-the-job training. My scientific experience was the more important need at CGD and AspGD. "Filamentous growth" was the only term. There was no distinction in the variety of specific responses of Candida. The set of specific terms were created under "population of unicellular organisms in response to x, y, z. etc. I prefer this classification over"multicellular organism." I recommend re-classifying the fungal terms as "population of unicellular organisms." The population of unicellulars is a scientifically more accurate for filamentous and dimorphic fungi and would solve many problems without requiring any group to recurate.

Thoughts and comments about this possibility?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geneontology/paint/issues/25#issuecomment-240468891, or mute the thread https://github.com/notifications/unsubscribe-auth/ABcuEFDISvt2pF24e0TUSMLTFsai6FXYks5qgzeKgaJpZM4JbJaj .

dianeoinglis commented 7 years ago

Yes, I will soon. I am examining the FAO for relevant existing terms and related terms. As soon as I gather a bit more info, I will add the request.

pgaudet commented 7 years ago

Corrected PTHR10050 (removed multicellular dev)