Knowledge-Graph-Hub / kg-obo

A package to transform all OBO ontologies into KGX TSV format and OBO json, and put the transformed graph in KGhub
https://knowledge-graph-hub.github.io/kg-obo/getting_started.html
GNU General Public License v3.0
28 stars 2 forks source link

Duplicates in node list of CHIRO, version 2015-11-23 #208

Open LucaCappelletti94 opened 1 year ago

LucaCappelletti94 commented 1 year ago

Describe the bug

The provided graph name and version contains the following duplicated nodes: ["GO:0003824", "NCBITaxon:2"]

To Reproduce

Investigate node list.

Expected behavior

Node list should not contain duplicates

caufieldjh commented 1 year ago

Looks like one of nodes is defined twice in the CHIRO OWL:

$ grep NCBITaxon_2 chiro.owl
    <!-- http://purl.obolibrary.org/NCBITaxon_2 -->
    <owl:Class rdf:about="http://purl.obolibrary.org/NCBITaxon_2"/>
                        <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/NCBITaxon_2"/>
                <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/NCBITaxon_2"/>
                        <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/NCBITaxon_2"/>
                <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/NCBITaxon_2"/>
    <!-- http://purl.obolibrary.org/obo/NCBITaxon_2 -->
    <owl:Class rdf:about="http://purl.obolibrary.org/obo/NCBITaxon_2"/>

Not sure why GO:0003824 appears twice.

Anyway, looks like we need an additional normalization step to handle these cases. I'm surprised it doesn't happen more often.