Open dkoslicki opened 7 months ago
This is the single biolink:Exon
node in KG2 (checked in RTX-KG2.9.0pre):
{
"iri": "http://www.ebi.ac.uk/efo/EFO_0004423",
"synonym": [
"exonic region"
],
"category_label": "exon",
"deprecated": "False",
"name": "exon",
"description": "An exon is a nucleic acid sequence that is represented in the mature form of an RNA molecule either after portions of a precursor RNA (introns) have been removed by cis-splicing or when two or more precursor RNA molecules have been ligated by trans-splicing.",
"provided_by": "['infores:efo']",
"id": "EFO:0004423",
"category": "biolink:Exon",
"update_date": "3630"
}
This node comes from EFO, which is in the multi ont load process. I would not be surprised if that ETL is "borked". I will take a look to see where this is coming from.
Here is the term in efo.owl
:
<!-- http://www.ebi.ac.uk/efo/EFO_0004423 -->
<owl:Class rdf:about="http://www.ebi.ac.uk/efo/EFO_0004423">
<rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/BFO_0000040"/>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000050"/>
<owl:someValuesFrom rdf:resource="http://www.ebi.ac.uk/efo/EFO_0004422"/>
</owl:Restriction>
</rdfs:subClassOf>
<obo:IAO_0000115>An exon is a nucleic acid sequence that is represented in the mature form of an RNA molecule either after portions of a precursor RNA (introns) have been removed by cis-splicing or when two or more precursor RNA molecules have been ligated by trans-splicing.</obo:IAO_0000115>
<oboInOwl:hasDbXref>NCIt:C13231</oboInOwl:hasDbXref>
<oboInOwl:hasDbXref>SNOMEDCT:33091005</oboInOwl:hasDbXref>
<oboInOwl:hasExactSynonym>exonic region</oboInOwl:hasExactSynonym>
<rdfs:label>exon</rdfs:label>
</owl:Class>
EFO:0004423
is a subclass of material entity
(BFO:0000040
), along with several other similar terms. It looks like the same issue also shows up with a different subclass of material entity
like enzyme
:
{
"iri": "http://purl.obolibrary.org/obo/OBI_0000427",
"category_label": "protein",
"deprecated": "False",
"name": "enzyme",
"description": "(protein or rna) or has_part (protein or rna) and has_function some GO:0003824 (catalytic activity); (protein or rna) or has_part (protein or rna) and has_function some GO:0003824 (catalytic activity)",
"provided_by": "['infores:efo', 'infores:genepio']",
"id": "OBI:0000427",
"category": "biolink:Protein",
"update_date": "2024-02-21 01:39:56 GMT"
}
These are all of the subclasses of material entity
:
Running
match (n) where n.iri in ["http://purl.obolibrary.org/obo/BTO_0002690", "http://www.ebi.ac.uk/efo/EFO_0004446", "http://purl.obolibrary.org/obo/BTO_0000214", "http://www.ebi.ac.uk/efo/EFO_0000324", "http://purl.obolibrary.org/obo/GO_0005575", "http://www.ebi.ac.uk/efo/EFO_0006794", "http://purl.obolibrary.org/obo/CHEBI_24431", "http://www.ebi.ac.uk/efo/EFO_0005066", "http://www.ebi.ac.uk/efo/EFO_0000469", "http://purl.obolibrary.org/obo/OBI_0000427", "http://www.ebi.ac.uk/efo/EFO_0004422", "http://www.ebi.ac.uk/efo/EFO_0004423", "http://purl.obolibrary.org/obo/SO_0000704", "http://www.ebi.ac.uk/efo/EFO_0004420", "http://www.ebi.ac.uk/efo/EFO_0000548", "http://www.ebi.ac.uk/efo/EFO_0005060", "http://purl.obolibrary.org/obo/OBI_0100026", "http://www.ebi.ac.uk/efo/EFO_0000635", "http://purl.obolibrary.org/obo/OBI_0000245", "http://purl.obolibrary.org/obo/MPATH_0", "http://www.ebi.ac.uk/efo/EFO_0000663", "http://purl.obolibrary.org/obo/OBI_0000181", "http://www.ebi.ac.uk/efo/EFO_0010579", "http://purl.obolibrary.org/obo/OBI_0100051", "http://www.ebi.ac.uk/efo/EFO_0004359", "http://purl.obolibrary.org/obo/BTO_0001384", "http://purl.obolibrary.org/obo/OBI_0100051"] return n.id, n.name, n.category, n.provided_by
on kg2endpoint-kg2-9-0.rtx.ai
we get:
n.id | n.name | n.category | n.provided_by |
---|---|---|---|
"GO:0005575" | "cellular_component" | "biolink:CellularComponent" | "['infores:efo', 'infores:cl', 'infores:go-plus', 'infores:hpo', 'infores:mondo', 'infores:nbo', 'infores:pato', 'infores:pr', 'infores:uberon', 'infores:go']" |
"CHEBI:24431" | "chemical entity" | "biolink:MolecularEntity" | "['infores:efo', 'infores:chebi', 'infores:cl', 'infores:disease-ontology', 'infores:foodon', 'infores:genepio', 'infores:go-plus', 'infores:hpo', 'infores:mondo', 'infores:nbo', 'infores:pato', 'infores:pr', 'infores:uberon']" |
"OBI:0100026" | "organism" | "biolink:PhysicalEntity" | "['infores:efo', 'infores:foodon', 'infores:genepio', 'infores:go-plus', 'infores:pato', 'infores:pr', 'infores:ro']" |
"SO:0000704" | "gene" | "biolink:Gene" | "['infores:efo', 'infores:disease-ontology', 'infores:go-plus', 'infores:mondo', 'infores:pr', 'infores:uberon']" |
"OBI:0100051" | "specimen" | "biolink:PhysicalEntity" | "['infores:efo', 'infores:genepio']" |
"EFO:0006794" | "cerebrospinal fluid biomarker measurement" | "biolink:InformationContentEntity" | "['infores:efo']" |
"EFO:0000635" | "organism part" | "biolink:AnatomicalEntity" | "['infores:efo']" |
"EFO:0000663" | "pool" | "biolink:PhysicalEntity" | "['infores:efo']" |
"EFO:0005060" | "instrument part" | "biolink:PhysicalEntity" | "['infores:efo']" |
"EFO:0005066" | "collection of material" | "biolink:MaterialSample" | "['infores:efo']" |
"BTO:0000214" | "cell culture" | "biolink:PhysicalEntity" | "['infores:efo']" |
"EFO:0004423" | "exon" | "biolink:Exon" | "['infores:efo']" |
"EFO:0004422" | "exome" | "biolink:PhysicalEntity" | "['infores:efo']" |
"EFO:0004420" | "genome" | "biolink:PhysicalEntity" | "['infores:efo']" |
"EFO:0004446" | "biological macromolecule" | "biolink:MolecularEntity" | "['infores:efo']" |
"EFO:0000324" | "cell type" | "biolink:Cell" | "['infores:efo']" |
"EFO:0000548" | "instrument" | "biolink:PhysicalEntity" | "['infores:efo']" |
"EFO:0000469" | "environmental factor" | "biolink:PhysicalEntity" | "['infores:efo']" |
"EFO:0010579" | "proteome" | "biolink:PhysicalEntity" | "['infores:efo']" |
"OBI:0000245" | "organization" | "biolink:PhysicalEntity" | "['infores:efo', 'infores:foodon', 'infores:genepio']" |
"MPATH:0" | "pathological entity" | "biolink:BiologicalEntity" | "['infores:efo', 'infores:genepio', 'infores:hpo']" |
"OBI:0000427" | "enzyme" | "biolink:Protein" | "['infores:efo', 'infores:genepio']" |
"BTO:0001384" | "tissue culture" | "biolink:PhysicalEntity" | "['infores:efo']" |
"EFO:0004359" | "telomere" | "biolink:PhysicalEntity" | "['infores:efo']" |
"OBI:0000181" | "population" | "biolink:PhysicalEntity" | "['infores:efo', 'infores:genepio']" |
"BTO:0002690" | "biofilm" | "biolink:PhysicalEntity" | "['infores:efo']" |
Many of these seem to be problematic.
I might have mentioned it before, but there is only a single node with the category
biolink:Exon
: a node with the nameExon
. I think either the ETL-ing of whatever KP has exon info is borked, or something else fishy might be going on. Otherwise, should this node (and the category) just be removed?