Closed CNwangbin closed 2 years ago
Hi, you can bring your own .owl
files and turn them into mOWL datasets using:
from mowl.datasets.base import PathDataset
ds = PathDataset("training_ontology.owl", "validation_ontology.owl", "testing_ontology.owl")
The validation and testing owl files are optional. For more details on how to add information to an ontology please refer to this example.
Yes, thanks. I found that. ------------------ Original ------------------ From: "Fernando Zhapa"; Date: 2022年9月17日(星期六) 晚上7:28 To: "bio-ontology-research-group/mowl"; Cc: @.***>; "Author"; Subject: Re: [bio-ontology-research-group/mowl] Is there any way to run generate embeddings from my corpus ,not the built-in dataset? (Issue #27)
Hi, you can bring your own .owl files and turn them into mOWL datasets using:
from mowl.datasets.base import PathDataset ds = PathDataset("training_ontology.owl", "validation_ontology.owl", "testing_ontology.owl")
The validation and testing owl files are optional.
For more details on how to add information to an ontology please refer to this example.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
I have a new question. After training, I got a embeddings generator which corresponding to a certain class. My question is "how to get the correct classes order for the embeddings generator" . ------------------ Original ------------------ From: @.>; Date: Sat, Sep 17, 2022 07:28 PM To: @.>; Cc: @.>; @.>; Subject: Re: [bio-ontology-research-group/mowl] Is there any way to run generate embeddings from my corpus ,not the built-in dataset? (Issue #27)
Hi, you can bring your own .owl files and turn them into mOWL datasets using:
from mowl.datasets.base import PathDataset ds = PathDataset("training_ontology.owl", "validation_ontology.owl", "testing_ontology.owl")
The validation and testing owl files are optional.
For more details on how to add information to an ontology please refer to this example.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
The embeddings contents is usually a dictionary of the form class name -> embedding vector
. This would apply for methods such as Word2Vec. Are you using that one or other different?
Yes,I found that in tutorial examples. The new question is how to find classes order corresponding to embedding generator order.
------------------ 原始邮件 ------------------ 发件人: "Fernando Zhapa"; 发送时间: 2022年9月17日(星期六) 晚上8:33 收件人: "bio-ontology-research-group/mowl"; 抄送: @.***>; "Author"; 主题: Re: [bio-ontology-research-group/mowl] Is there any way to run generate embeddings from my corpus ,not the built-in dataset? (Issue #27)
The embeddings contents is usually a dictionary of the form class name -> embedding vector. This would apply for methods such as Word2Vec. Are you using that one or other different?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
You can access the classes in the dataset doing:
dataset = # Assume this is a mowl dataset
classes = dataset.classes.as_str
That is a list of classes from which you can generate a dictionary:
class_to_id = {v:k for k,v in enumerate(classes)}
Please let me know if this helps.
Yes, thx. But there are some question remain here. The number of classes should equal to the number of vectors. Is there any other misstake?
``
Yes, thx. But there are some question remain here. The number of classes should equal to the number of vectors. Is there any other misstake?
this is my code. `import mowl mowl.init_jvm("4g") from mowl.datasets.base import PathDataset
ds = PathDataset("go.owl") from mowl.projection.dl2vec.model import DL2VecProjector projector = DL2VecProjector(bidirectional_taxonomy = True) edges = projector.project(ds.ontology) from mowl.walking.factory import walker_factory walker = walker_factory("deepwalk", alpha = 0.1, walk_length = 10, num_walks = 10, outfile = "data/walks/walk.txt") walker.walk(edges) from gensim.models import Word2Vec from gensim.models.word2vec import LineSentence
corpus = LineSentence(walker.outfile)
w2v_model = Word2Vec( corpus, sg=1, min_count=1, vector_size=10, window = 10, epochs = 10, workers = 16)
`
There is no mistake on your code, but there are classes in the ontology that are not being captured by the projection method (DL2VecProjector) because those classes are obsolete, deprecated or are part of axioms that cannot be processed by the projection method.
As an example check http://purl.obolibrary.org/obo/GO_1901916
That is part of ds.classes.as_str
but not part of w2v_model.wv
(I tried with the last version of GO at this time) and appears as OBSOLETE in the go.owl
file.
Considering obsolete classes in the function dataset.classes.as_str
will be considered a bug and will be fixed in future versions.
If you need further help, please let us know. Thanks.
Thanks, it's clear for me. But I just want to get the w2v_model.wv
classes order, not numerical vector only. Is there any way now?
Since you are working with Gensim's Word2Vec model, would w2v_model.wv.key_to_index
work?
thanks you, ferzcam. It works well. And is that right way to get (class, vector) pairs?
I think that is right. There is also this way:
for class in w2v_model.wv.index_to_key:
vector = in w2v_model.wv[class]
print(class)
print(vector)
OK, that is good.
I want to use my data as corpus.