Closed CNwangbin closed 10 months ago
Hi. Thanks for your reporting this issue. I think the outdated terms are not being considered when generating the graph-projection for the ontology. In that case, I can suggest two solutions: (1) modify the ontology beforehand and add the outdated terms by yourself or (2) modify the source code, for which you should look at this line where the axioms are retrieved.
In terms of (1), here in Section 5.5 mentions that the boolean value true
in the owl:deprecated
annotation indicates deprecation. So maybe changing to false
can help you. I hope this helps and if you have additional questions, let me know.
Because manually modifying the Ontology file is very cumbersome, I think method (2) is more elegant. I changed the code of line28 True to False. I reinstalled the software and ran my code, but it doesn't seem to have taken effect for both wrapped and pykeen ways.
I am not sure that deprecated classes can meaningfully be integrated. Once deprecated in GO, they will be removed from all axioms, therefore no edges will be created that connect to them. They would be disconnected nodes. Are you sure you want to apply any kind of embedding or learning process to these? There is nothing that can be learned from these classes if they are not used in axioms.
I am not sure that deprecated classes can meaningfully be integrated. Once deprecated in GO, they will be removed from all axioms, therefore no edges will be created that connect to them. They would be disconnected nodes. Are you sure you want to apply any kind of embedding or learning process to these? There is nothing that can be learned from these classes if they are not used in axioms.
Thanks.
Describe the bug
When I ran the custom go.owl file using the example TransE code, it seems that some terms were lost.
How to reproduce
There are mowl wrapped code as follows.
` import mowl mowl.init_jvm("20g") from mowl.projection.edge import Edge from mowl.projection import TaxonomyProjector
from mowl.datasets.base import PathDataset
dataset = PathDataset("go_cafa3.owl")
from mowl.models import GraphPlusPyKEENModel from mowl.projection import DL2VecProjector from pykeen.models import TransE import torch as th
model = GraphPlusPyKEENModel(dataset) model.set_projector(DL2VecProjector()) model.set_kge_method(TransE, random_seed=42) model.optimizer = th.optim.Adam model.lr = 0.001 model.batch_size = 32 model.train(epochs = 1)
class_embs = model.class_embeddings role_embs = model.object_property_embeddings ind_embs = model.individual_embeddings
terms = [] vectors = [] for i,word in enumerate(class_embs): vector = class_embs[word] items = word.split('/') if len(items) > 1: word = items[-1] if word.startswith('GO') and not word.endswith('>'): term = items[-1] terms.append(term) vectors.append(vector)
'GO:0005926' in terms `
But GO_0005926 found in owl file like " true</owl:deprecated>
</owl:Class> ...".
It also occured in pykeen version code like:
` import mowl mowl.init_jvm("20g") from mowl.projection.edge import Edge from mowl.datasets.builtin import PPIYeastSlimDataset from mowl.projection import TaxonomyProjector
from mowl.datasets.base import PathDataset
dataset = PathDataset("go.owl")
proj = TaxonomyProjector(True)
edges = proj.project(dataset.ontology)
edges = [Edge("node1", "rel1", "node3"), Edge("node5", "rel2", "node1"), Edge("node2", "rel1", "node1")] # example of edges
triples_factory = Edge.as_pykeen(edges, create_inverse_triples = True)
from pykeen.models import TransE pk_model = TransE(triples_factory=triples_factory, embedding_dim = 50, random_seed=42) from mowl.kge import KGEModel
model = KGEModel(triples_factory, pk_model, epochs = 1, batch_size = 32) model.train() ent_embs = model.class_embeddings_dict rel_embs = model.object_property_embeddings_dict
terms = [] vectors = [] for i,word in enumerate(ent_embs): vector = ent_embs[word] items = word.split('/') if len(items) > 1: word = items[-1] if word.startswith('GO') and not word.endswith('>'): term = items[-1] terms.append(term) vectors.append(vector)
'GO_0005926' in terms `
And it can be observed that when running the code `proj = TaxonomyProjector(True)
edges = proj.project(dataset.ontology)
edges = [Edge("node1", "rel1", "node3"), Edge("node5", "rel2", "node1"), Edge("node2", "rel1", "node1")] # example of edges
triples_factory = Edge.as_pykeen(edges, create_inverse_triples = True)`, it shows "INFO: Number of ontology classes: 50119", but the final len(terms) is only 42819. Is it because the outdated terms were discarded?
Environment
OS information
NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/" CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"
Python version
Python=3.8.13
mOWL version
mowl-borg==0.2.0
JDK version
openjdk 17.0.3-internal 2022-04-19
Additional information
If I need to use embeddings for outdated terms, how should I proceed?