geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
223 stars 40 forks source link

Erroneous existing relations & missing relations reported by UTHealth #22938

Closed suzialeksander closed 1 week ago

suzialeksander commented 2 years ago

From helpdesk, Licong Cui from The University of Texas Health Science Center at Houston:

This is Licong Cui from The University of Texas Health Science Center at Houston. One of my research interests is biomedical ontology quality assurance. We have a recent work on quality assurance of relations in Gene Ontology (GO) and submitted a manuscript to Briefings in Bioinformatics. In this work, we identified 821 missing relations and 45 erroneous relations in GO. The manuscript has received favorable reviews from two of three reviewers, but the third reviewer recommends that we obtain direct feedback from the GO consortium about these relations and make requests of the corresponding changes.

I would like to seek your help with reviewing these relations and being co-author(s) for this manuscript if you are interested. Since the journal office asks a quick turn around on revision, your response is greatly appreciated.

Erroneous existing relations.xlsx Missing relations.xlsx

suzialeksander commented 2 years ago

@rashmie If you need to add anything to this ticket, please do.

rashmie commented 2 years ago

Thanks @suzialeksander for opening this ticket.

raymond91125 commented 2 years ago

To assess their findings and make appropriate corrections when appropriate would take considerable amount of work.

  1. If would be helpful to have a script find the 'explanation' of the relationships (instead of manually doing it in Protege).
  2. How do we prioritize this work.
  3. After vetting, we may consider including their methods in our PR checking process.

Below is what I've found so far.

Spot checked a couple of their finding of erroneous relations: GO:0002150 hypochlorous acid catabolic process IS_A GO:0016054 organic acid catabolic process probably caused by manual assertion oxoacid metabolic process IS_A organic acid metabolic process

GO:0071923 negative regulation of cohesin loading IS_A GO:0045875 negative regulation of sister chromatid cohesion (should be part_of) caused by manual assertion IS_A negatively regulates some sister chromatid cohesion

They also included an 'error' relation that does not exist in GO. GO:1905309 positive regulation of cohesin loading IS_A GO:0045876 positive regulation of sister chromatid cohesion

ValWood commented 2 weeks ago

I am looking at the file, and I see lots of issues in the suggested relations

row 1 incorrect, not all immunological synapse formation is lymphocyte activation (not all immunological synapses are lymphocyte and I'm not an immunologist but this may not be related to activation row2 auditory receptor cells are not part of the epidermis row3 heart rudiment formation should be part_of heart morphogenesis row4 s-triazine compound metabolic process (this is a pesticide, the term should not exist in GO), at least is not a cellular nitrogen metabolic process which should refer to endogenous compunds row 5 suggested parent GO:0031325 obsolete positive regulation of cellular metabolic process is obsoleted as an unecessary grouping term row 6 is true , ketone body should be - is_a small molecule metabolic process

Looking through the list I see lots of other issues for example

Considering a) This analysis would need repeating to remove all obsoletes, and relationships which have since been added b) The evaluation seems to have a lot of spurious recommendations c) Many of these issues will be resolved by logical relations

it isn't a good use of ontology editors' limited time to go through these one by one without a prior revision of the dataset to fix the above issues.

But we could add

heart rudiment formation should be part_of heart morphogenesis ketone body is_a small molecule metabolic process rough endoplasmic reticulum is_a rough endoplamic reticulum cisterna since are seen to be are clearly missing

raymond91125 commented 1 week ago

Actually, "heart rudiment formation" is part of "heart development". In parallel, "heart morphogenesis" is part of "heart development". image Perhaps add "heart rudiment morphrogenesis" part_of "heart morphogenesis".

raymond91125 commented 1 week ago

"rough endoplasmic reticulum is_a rough endoplamic reticulum cisterna" does not seem correct. I would have thought "rough endoplamic reticulum cisterna part_of rough endoplasmic reticulum". image