calipho-sib / cellosaurus

A knowledge resource on cell lines - From SIB CALIPHO group
https://www.cellosaurus.org
Creative Commons Attribution 4.0 International
13 stars 1 forks source link

R ontologyIndex parsing issues with Cellosaurus 44 release #7

Open mjsteinbaugh opened 1 year ago

mjsteinbaugh commented 1 year ago

Hi, I noticed that the cellosaurus.obo file from the 44 release currently has parsing issues with the R ontologyIndex package. The cellosaurus.obo files from the 42 and 43 releases work as expected.

Here's a reproducible example in R:

## Use `install.packages("ontologyIndex")` to install ontologyIndex.
## Works.
url <- "https://github.com/calipho-sib/cellosaurus/raw/2a8b4be611ca9444c79b27fa3c62d0f8d0330e27/cellosaurus.obo"
file <- "cellosaurus-43.obo"
download.file(url = url, destfile = file)
ont <- ontologyIndex::get_ontology(file = file, extract_tags = "everything")
## Broken -- results in memory leak.
url <- "https://github.com/calipho-sib/cellosaurus/raw/6facc6590e4360dbfcdb2e7f489bc25e97164711/cellosaurus.obo"
file <- "cellosaurus-44.obo"
download.file(url = url, destfile = file)
ont <- ontologyIndex::get_ontology(file = file, extract_tags = "everything")

Best, Mike

schelhorn commented 1 year ago

I noticed the same and have reverted to the 41 release. Perhaps there is a cycle somewhere in the ontology, or a formatting error? Could you please have a look, @AmosBairoch ?

AmosBairoch commented 1 year ago

I think I already replied that the release 44 OBO file pass the Robot "report" check with 3 errors that are not "real" error. ie: ERROR    missing_ontology_description http://purl.obolibrary.org/obo/Cellosaurus.owl    dc:description ERROR    missing_ontology_license http://purl.obolibrary.org/obo/Cellosaurus.owl    dc:license ERROR    missing_ontology_title http://purl.obolibrary.org/obo/Cellosaurus.owl    dc:title And that it has tons of warning about invalid Xrefs (that have spaces in their identifiers).

So unless you tell me what is the error that your parser detects that ROBOT and OboEdit does not detect, there will not be any way of fixing this issue!

Best Amos

On 07.03.2023 17:31, Sven-Eric Schelhorn wrote:

I noticed the same and have reverted to the 41 release. Perhaps there is a cycle somewhere in the ontology, or a formatting error? Could you please have a look, @AmosBairoch https://github.com/AmosBairoch ?

— Reply to this email directly, view it on GitHub https://github.com/calipho-sib/cellosaurus/issues/7#issuecomment-1458467580, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACDUI4QQ7N5G7TTDUDLBW53W25PHDANCNFSM6AAAAAATZIGE7A. You are receiving this because you were mentioned.Message ID: @.***>

--

Professor at the Faculty of Medicine of the University of Geneva Group leader at the SIB - Swiss Institute of Bioinformatics

Preferred @. Alternative @.

Group: CALIPHO Current projects: neXtProt, Cellosaurus

schelhorn commented 1 year ago

Thanks, Amos. Our parsers do not report an error but produce a memory leak, which usually happens when there is a cycle somewhere in the OBO "DAG" (not so acyclic anymore, of course). Unfortunately, ontologyIndex cannot detect the cycle location.

We had a similar issue concerning cycles with the OBO of the Experimental Factor Ontology (EFO) and the EBI SPOT people were able to detect and remove the cycle(s) - perhaps their approach could help:

@matentzn: Maybe you can use this list here to try and find the circles. What I did is a nasty trick: I materialised all part of relations as is a and then used ROBOT: robot remove -i data/efo.owl --axioms disjoint query --update sparql/direct-links.ru reason --equivalent-classes-allowed none

Concerning the ROBOT errors: were any of them new in v44 compared to v43?