Knowledge-Graph-Hub / kg-obo

A package to transform all OBO ontologies into KGX TSV format and OBO json, and put the transformed graph in KGhub
https://knowledge-graph-hub.github.io/kg-obo/getting_started.html
GNU General Public License v3.0
30 stars 2 forks source link

Some ontologies aren't being transformed fully because some OWL files contain imports to other OWL files #75

Closed justaddcoffee closed 3 years ago

justaddcoffee commented 3 years ago

Describe the desired behavior

Some OWL files contain imports to other OWL files, and KGX does not seem to follow these imports. For example, here is the OWL representation of Upheno:

<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [
    <!ENTITY owl "http://www.w3.org/2002/07/owl#" >
    <!ENTITY obo "http://purl.obolibrary.org/obo/" >
    <!ENTITY xsd "http://www.w3.org/2001/XMLSchema#" >
    <!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#" >
    <!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#" >
    <!ENTITY oboInOwl "http://www.geneontology.org/formats/oboInOwl#" >
]>

<rdf:RDF xmlns="&obo;x-bfo.owl#"
     xml:base="&obo;x-bfo.owl"
     xmlns:obo="http://purl.obolibrary.org/obo/"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#">
    <owl:Ontology rdf:about="&obo;upheno.owl">
        <owl:imports rdf:resource="&obo;upheno/metazoa.owl"/>
    </owl:Ontology>
</rdf:RDF>

Note this block:

    <owl:Ontology rdf:about="&obo;upheno.owl">
        <owl:imports rdf:resource="&obo;upheno/metazoa.owl"/>
    </owl:Ontology>

which points to upheno/metazoa.owl, where all the good stuff is.

Because of this, kg-obo transforms currently upheno to this JSON, which is not terribly useful:

{
    "nodes": [
        {
            "id": "OBO:upheno.owl",
            "type": "owl:Ontology",
            "category": [
                "biolink:NamedThing"
            ],
            "provided_by": [
                "uphenolm7m33re"
            ]
        },
        {
            "id": "OBO:upheno/metazoa.owl",
            "category": [
                "biolink:NamedThing"
            ],
            "provided_by": [
                "uphenolm7m33re"
            ]
        }
    ],
    "edges": [
        {
            "subject": "OBO:upheno.owl",
            "predicate": "owl:imports",
            "object": "OBO:upheno/metazoa.owl",
            "relation": "owl:imports",
            "knowledge_source": [
                "uphenolm7m33re"
            ]
        }
    ]
}

Additional context

I don't think support for this is critical for our immediate use case that is driving development, i.e. kg-idg.

For now, we should possibly look for imports like this in the XML and abandon the transform with an error if they are present.

Eventually, we will want to parse the XML, find these imports, download these OWL files, and feed these to KGX in addition to the "main" OWL file.

@cmungall @matentzn @caufieldjh

caufieldjh commented 3 years ago

See #76 for temporary fix

caufieldjh commented 3 years ago

This relates back to #21 in terms of pre-processing

justaddcoffee commented 3 years ago
caufieldjh commented 3 years ago

For reference, as of the last build with the temp fix in #76:

05:35:01  INFO:kg-obo:Successfully transformed 133: ['bfo', 'chebi', 'doid', 'go', 'obi', 'pato', 'pr', 'xao', 'zfa', 'aeo', 'agro', 'aism', 'amphx', 'apo', 'aro', 'bco', 'bspo', 'bto', 'cdno', 'cheminf', 'chmo', 'cio', 'cl', 'clao', 'clo', 'clyh', 'cmo', 'cob', 'ddanat', 'ddpheno', 'dpo', 'dron', 'ecao', 'eco', 'ecocore', 'ecto', 'emapa', 'eupath', 'exo', 'fao', 'fbbt', 'fbcv', 'fbdv', 'fma', 'fovt', 'gecko', 'geno', 'gno', 'hancestro', 'hao', 'hom', 'hsapdv', 'hso', 'htn', 'iao', 'ico', 'ido', 'labo', 'ma', 'mco', 'mi', 'miapa', 'mmo', 'mmusdv', 'mod', 'mondo', 'mop', 'mp', 'mpath', 'mro', 'nbo', 'ncbitaxon', 'ncit', 'ncro', 'oae', 'oarcs', 'obcs', 'obib', 'ogg', 'ogms', 'ohd', 'olatdv', 'omo', 'omp', 'omrse', 'opl', 'opmi', 'ornaseq', 'ovae', 'pco', 'pdro', 'pdumdv', 'plana', 'planp', 'ppo', 'pw', 'rbo', 'rs', 'rxno', 'so', 'spd', 'stato', 'swo', 'symp', 'taxrank', 'trans', 'tto', 'uberon', 'uo', 'vt', 'vto', 'wbbt', 'wbls', 'wbphenotype', 'xco', 'xpo', 'zeco', 'zfs', 'zp', 'hp', 'sbo', 'scdo', 'txpo', 'sibo', 'fix', 'rex', 'ehdaa2', 'upa', 'ero', 'idomal', 'miro', 'tads', 'tgma']
05:35:01  INFO:kg-obo:Failed to transform 60: ['po', 'apollo_sv', 'caro', 'cdao', 'chiro', 'cido', 'cro', 'cteno', 'cto', 'cvdo', 'dideo', 'duo', 'envo', 'fbbi', 'fideo', 'flopo', 'foodon', 'fypo', 'genepio', 'geo', 'iceo', 'ino', 'maxo', 'mf', 'mfmo', 'mfoem', 'mfomd', 'micro', 'mpio', 'ms', 'nomen', 'oba', 'ogsf', 'ohmi', 'ohpi', 'omit', 'one', 'ons', 'ontoneo', 'oostt', 'peco', 'phipo', 'poro', 'psdo', 'pso', 'ro', 'sepio', 'to', 'upheno', 'vo', 'xlmod', 'fobi', 'gsso', 'kisao', 'mamo', 'vario', 'ogi', 'ceph', 'gaz', 'rnao']

The previous build had 30 failed transforms, so an additional 30 contain import statements.