frankkramer-lab / rBiopaxParser

Parses BioPAX files (BioPAX Version 2 and BioPAX Version 3 are supported) and represents them in R. There is quite alot of documentation available for this package. See the reference manual and the vignette for examples! If you have any questions or suggestions... DONT BE SHY! Mail me! See package description for an email address.
10 stars 2 forks source link

No terms detected in ontology source for both Experimental Factor Ontology (EFO) and The Mammalian Phenotype Ontology (MP) .owl files #8

Closed moldach closed 3 years ago

moldach commented 3 years ago

rBiopaxParser is not reading in multiple Web Ontology Language (OWL) files

This is my first time working with .owl files; I need to work with both Experimental Factor Ontology (EFO) and The Mammalian Phenotype Ontology (MP) but this package fails to read both:

Downloading resources

# EFO
wget -S https://github.com/EBISPOT/efo/releases/download/current/efo.owl

# MP
wget --no-check-certificate --no-proxy -O mp.owl https://www.ebi.ac.uk/ols/ontologies/mp/download

Try to load with rBiopaxParser

Following instructions for loading a file (not covered in the README - only in the vignette

> library(rBiopaxParser)
Loading required package: data.table
data.table 1.13.0 using 14 threads (see ?getDTthreads).  Latest news: r-datatable.com
> biopax <- readBiopax('mp.owl')
> print(biopax)
Summary of the biopax object:
           Length Class                             Mode     
namespaces 25     SimplifiedXMLNamespaceDefinitions character
ns_rdf      1     -none-                            character
ns_owl      1     -none-                            character
ns_bp       0     -none-                            character
file        1     -none-                            character

Internal data:
$namespaces

            "http://purl.obolibrary.org/obo/mp.owl#" 
                                                  cl 
                "http://purl.obolibrary.org/obo/cl#" 
                                                  dc 
                  "http://purl.org/dc/elements/1.1/" 
                                                  go 
                "http://purl.obolibrary.org/obo/go#" 
                                                 obo 
                   "http://purl.obolibrary.org/obo/" 
                                                 owl 
                    "http://www.w3.org/2002/07/owl#" 
                                                 rdf 
       "http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
                                                 xsd 
                 "http://www.w3.org/2001/XMLSchema#" 
                                                core 
       "http://purl.obolibrary.org/obo/uberon/core#" 
                                                foaf 
                        "http://xmlns.com/foaf/0.1/" 
                                                pato 
              "http://purl.obolibrary.org/obo/pato#" 
                                                rdfs 
             "http://www.w3.org/2000/01/rdf-schema#" 
                                                swrl 
                   "http://www.w3.org/2003/11/swrl#" 
                                               chebi 
             "http://purl.obolibrary.org/obo/chebi/" 
                                               core1 
              "http://purl.obolibrary.org/obo/core#" 
                                               swrla 
"http://swrl.stanford.edu/ontologies/3.3/swrla.owl#" 
                                               swrlb 
                  "http://www.w3.org/2003/11/swrlb#" 
                                               terms 
                         "http://purl.org/dc/terms/" 
                                              chebi2 
            "http://purl.obolibrary.org/obo/chebi#2" 
                                              chebi3 
            "http://purl.obolibrary.org/obo/chebi#3" 
                                              chebi4 
            "http://purl.obolibrary.org/obo/chebi#1" 
                                              ubprop 
            "http://purl.obolibrary.org/obo/ubprop#" 
                                             mp-edit 
           "http://purl.obolibrary.org/obo/mp-edit#" 
                                             subsets 
        "http://purl.obolibrary.org/obo/ro/subsets#" 
                                            oboInOwl 
     "http://www.geneontology.org/formats/oboInOwl#" 
attr(,"class")
[1] "SimplifiedXMLNamespaceDefinitions" "XMLNamespaceDefinitions"          

$ns_rdf
[1] "rdf"

$ns_owl
[1] "owl"

$ns_bp
character(0)

$file
[1] "mp.owl"

Dimension of internal data.table:   

Summary of parsed internal data.table 
Length  Class   Mode 
     0   NULL   NULL 

Doesn't look like the package is reading the structure of the files:


> head(biopax$dt)
NULL
> listInstances(biopax, class="pathway")
Error in as.character(x) : 
  cannot coerce type 'builtin' to vector of type 'character'

Looking at HEAD:

% head efo.owl 
<?xml version="1.0"?>
<rdf:RDF xmlns="http://www.ebi.ac.uk/efo/efo.owl#"
     xml:base="http://www.ebi.ac.uk/efo/efo.owl"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:efo="http://www.ebi.ac.uk/efo/"
     xmlns:obo="http://purl.obolibrary.org/obo/"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:xml="http://www.w3.org/XML/1998/namespace"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
(base) oldachm@nl003:~/EFO-MAPPING % head n -50 efo.owl 
head: invalid trailing option -- 5
Try 'head --help' for more information.
(base) oldachm@nl003:~/EFO-MAPPING % head -n 50 efo.owl 
<?xml version="1.0"?>
<rdf:RDF xmlns="http://www.ebi.ac.uk/efo/efo.owl#"
     xml:base="http://www.ebi.ac.uk/efo/efo.owl"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:efo="http://www.ebi.ac.uk/efo/"
     xmlns:obo="http://purl.obolibrary.org/obo/"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:xml="http://www.w3.org/XML/1998/namespace"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:core="http://purl.obolibrary.org/obo/uberon/core#"
     xmlns:foaf="http://xmlns.com/foaf/0.1/"
     xmlns:obo1="http://purl.obolibrary.org/obo#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:skos="http://www.w3.org/2004/02/skos/core#"
     xmlns:chebi="http://purl.obolibrary.org/obo/chebi/"
     xmlns:mondo="http://purl.obolibrary.org/obo/mondo#"
     xmlns:terms="http://purl.org/dc/terms/"
     xmlns:chebi2="http://purl.obolibrary.org/obo/chebi#"
     xmlns:chebi3="http://purl.obolibrary.org/obo/chebi#2"
     xmlns:chebi4="http://purl.obolibrary.org/obo/chebi#3"
     xmlns:chebi5="http://purl.obolibrary.org/obo/chebi#1"
     xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#"
     xmlns:obolnOwl="http://www.geneontology.org/formats/obolnOwl#"
     xmlns:patterns="http://www.co-ode.org/patterns#"
     xmlns:ncbitaxon="http://purl.obolibrary.org/obo/ncbitaxon#">
    <owl:Ontology rdf:about="http://www.ebi.ac.uk/efo/efo.owl">
        <owl:versionIRI rdf:resource="http://www.ebi.ac.uk/efo/releases/v3.32.0/efo.owl"/>
        <obo:format-version rdf:datatype="http://www.w3.org/2001/XMLSchema#string">1.4</obo:format-version>
        <dc:creator>Gautier Koscielny</dc:creator>
        <dc:creator>Laura Huerta Martinez</dc:creator>
        <dc:creator>Olamidipupo Ajigboye</dc:creator>
        <dc:creator>Paola Roncaglia</dc:creator>
        <dc:creator>Trish Whetzel</dc:creator>
        <dc:creator>Zoe May Pendlington</dc:creator>
        <dc:rights rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Copyright [2014] EMBL - European Bioinformatics Institute
Licensed under the Apache License, Version 2.0 (the &quot;License&quot;); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an &quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the
License. </dc:rights>
        <terms:license rdf:datatype="http://www.w3.org/2001/XMLSchema#string">www.apache.org/licenses/LICENSE-2.0</terms:license>
        <efo:creator rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Catherine Leroy</efo:creator>
        <efo:creator>Dani Welter</efo:creator>
        <efo:creator rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Drashtti Vasant</efo:creator>
        <efo:creator rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Ele Holloway</efo:creator>
        <efo:creator>Eleanor Williams</efo:creator>
        <efo:creator rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Emma Kate Hastings</efo:creator>
        <efo:creator rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Helen Parkinson</efo:creator>
        <efo:creator rdf:datatype="http://www.w3.org/2001/XMLSchema#string">James Malone</efo:creator>
        <efo:creator rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Jon Ison</efo:creator>
        <efo:creator>Natalja Kurbatova</efo:creator>
        <efo:creator>Simon Jupp</efo:creator>
% head -n 50 mp.owl 
<?xml version="1.0"?>
<rdf:RDF xmlns="http://purl.obolibrary.org/obo/mp.owl#"
     xml:base="http://purl.obolibrary.org/obo/mp.owl"
     xmlns:cl="http://purl.obolibrary.org/obo/cl#"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:go="http://purl.obolibrary.org/obo/go#"
     xmlns:obo="http://purl.obolibrary.org/obo/"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:xml="http://www.w3.org/XML/1998/namespace"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:core="http://purl.obolibrary.org/obo/uberon/core#"
     xmlns:foaf="http://xmlns.com/foaf/0.1/"
     xmlns:pato="http://purl.obolibrary.org/obo/pato#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:swrl="http://www.w3.org/2003/11/swrl#"
     xmlns:chebi="http://purl.obolibrary.org/obo/chebi/"
     xmlns:core1="http://purl.obolibrary.org/obo/core#"
     xmlns:swrla="http://swrl.stanford.edu/ontologies/3.3/swrla.owl#"
     xmlns:swrlb="http://www.w3.org/2003/11/swrlb#"
     xmlns:terms="http://purl.org/dc/terms/"
     xmlns:chebi2="http://purl.obolibrary.org/obo/chebi#2"
     xmlns:chebi3="http://purl.obolibrary.org/obo/chebi#3"
     xmlns:chebi4="http://purl.obolibrary.org/obo/chebi#1"
     xmlns:ubprop="http://purl.obolibrary.org/obo/ubprop#"
     xmlns:mp-edit="http://purl.obolibrary.org/obo/mp-edit#"
     xmlns:subsets="http://purl.obolibrary.org/obo/ro/subsets#"
     xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#">
    <owl:Ontology rdf:about="http://purl.obolibrary.org/obo/mp.owl">
        <owl:versionIRI rdf:resource="http://purl.obolibrary.org/obo/mp/releases/2021-05-26"/>
        <obo:IAO_0000700 rdf:resource="http://purl.obolibrary.org/obo/MP_0000001"/>
        <dc:contributor rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Alicia Valenzuela</dc:contributor>
        <dc:contributor rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Anna Anagnostopolous</dc:contributor>
        <dc:contributor rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Christopher Mungall</dc:contributor>
        <dc:contributor rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Georgios V. Gkoutos</dc:contributor>
        <dc:contributor rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Howard Dene</dc:contributor>
        <dc:contributor rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Melissa Berry</dc:contributor>
        <dc:contributor rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Michelle Knowlton</dc:contributor>
        <dc:contributor rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Nicole Vasilevsky</dc:contributor>
        <dc:contributor rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Paul Schofield</dc:contributor>
        <dc:creator rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Carroll W. Goldsmith</dc:creator>
        <dc:creator rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Cynthia L. Smith</dc:creator>
        <dc:creator rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Janan T. Eppig</dc:creator>
        <dc:creator rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Susan Bello</dc:creator>
        <dc:description rdf:datatype="http://www.w3.org/2001/XMLSchema#string">The Mammalian Phenotype Ontology is being developed by Cynthia L. Smith, Susan M. Bello, Carroll W. Goldsmith and Janan T. Eppig, as part of the Mouse Genome Database (MGD) Project, Mouse Genome Informatics (MGI), The Jackson Laboratory, Bar Harbor, ME. This file contains pre-coordinated phenotype terms, definitions and synonyms that can be used to describe mammalian phenotypes. The ontology is represented as a directed acyclic graph (DAG). It organizes phenotype terms into major biological system headers such as nervous system and respiratory system.  This ontology is currently under development. Weekly updates are available at the Mouse Genome Informatics (MGI) ftp site (ftp://ftp.informatics.jax.org/pub/reports/index.html#pheno) as well as the OBO Foundry site (http://obofoundry.org/). Questions, comments and suggestions are welcome, and should be directed to pheno@jax.org, Susan.Bello@jax.org or to GitHub tracker (https://github.com/obophenotype/mammalian-phenotype-ontology/issues) MGD is funded by NIH/NHGRI grant HG000330.</dc:description>
        <dc:language rdf:datatype="http://www.w3.org/2001/XMLSchema#string">English</dc:language>
        <dc:title rdf:datatype="http://www.w3.org/2001/XMLSchema#string">The Mammalian Phenotype Ontology</dc:title>
        <terms:license rdf:resource="https://creativecommons.org/licenses/by/4.0/"/>
        <oboInOwl:default-namespace rdf:datatype="http://www.w3.org/2001/XMLSchema#string">MPheno.ontology</oboInOwl:default-namespace>
        <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">The Mammalian Phenotype Ontology Copyright © 2007, 2008, 2011, 2012, 2015 by The Jackson Laboratory.</rdfs:comment>
frankkramer commented 3 years ago

@moldach Welcome to Ontologies - unfortunately it's not the easiest thing to work with in R. rBiopaxParser is actually skipping the OWL definitions and going down to XML level (using the XML package) to parse files which abide to XML- AND OWL- AND Biopax-format. Everything else will break. If you want to take the XML road as well take a look at the code in the parseBiopax.R file.

moldach commented 3 years ago

rBiopaxParser is actually skipping the OWL definitions and going down to XML level (using the XML package) to parse files which abide to XML- AND OWL- AND Biopax-format. Everything else will break.

Are you saying these files do not abide to XML and OWL format? That rBiopaxParser is breaking because of this?

frankkramer commented 3 years ago

@moldach I'm saying the files do not abide to XML and OWL and BioPAX. That's logical "and"s, not "or"s.