bio-ontology-research-group / mowl

mOWL: Machine Learning library with Ontologies
BSD 3-Clause "New" or "Revised" License
55 stars 4 forks source link

A error when running mowl for opa2vec demo #56

Closed CNwangbin closed 1 year ago

CNwangbin commented 1 year ago

When I ran the example code of opa2vec on https://mowl.readthedocs.io/en/latest/examples/syntactic/plot_2_opa2vec.html#sphx-glr-examples-syntactic-plot-2-opa2vec-py, there occured a error. How to fix it? `--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /home/wangbin/contrast_exp/opa2vec/opa2vec_cafa3.ipynb Cell 4 in 1 ----> 1 subclass_axioms = mowl_reasoner.infer_subclass_axioms(classes) 2 equivalent_class_axioms = mowl_reasoner.infer_equivalent_class_axioms(classes)

File /home/software/anaconda3/envs/mowl/lib/python3.8/site-packages/mowl/reasoning/base.py:15, in count_added_axioms..wrapper(self, ontology) 13 @wraps(func) 14 def wrapper(self, ontology): ---> 15 initial_number = ontology.getAxiomCount() 16 func(self, ontology) 17 final_number = ontology.getAxiomCount()

AttributeError: 'org.semanticweb.owlapi.util.CollectionFactory.Cond' object has no attribute 'getAxiomCount' `

ferzcam commented 1 year ago

Hi. Would you please write the exact code of your script? Even if it is the same as in the demo. Also, would you please specify the version of mOWL and JDK that you are using? I just reproduced the demo and it worked for me so it would help if you provide your set up. Thanks.

CNwangbin commented 1 year ago

Thanks for your quickly reply. I checked my software version and fixed this bug by updating my mowl version from 0.1.0 to 0.1.1.

CNwangbin commented 1 year ago

I have a new question regarding opa2vec. How can I create a GO annotation file in .owl format, like SLIM_DATA_URL = 'https://bio2vec.cbrc.kaust.edu.sa/data/mowl/ppi_yeast_slim.tar.gz'. This compressed package contains three files ontology.owl, test.owl, and valid.owl. As far as I know, the ontology.owl file defines the relationships between GO terms. Could you please explain what the test.owl and valid.owl files are used for respectively and how to construct them?

ferzcam commented 1 year ago

Hi. The creation of valid.owl and test.owl will depend on the task. For example, for PPI Yeast Slim, we used Yeast subset of GO (http://geneontology.org/docs/download-ontology/). Then we used PPIs from String and split them for training, validation and testing. The traning PPIs where added to the subset of GO, and the validation and testing PPIs where used to create validation and testing ontologies. You can also find this information in the supplementary material of the mOWL paper.

However, if your task do not use any validation or testing samples, you can use the training ontology.

CNwangbin commented 1 year ago

Hello, thank you for your patient explanation. I have read your provided paper on mOWL and its supplementary materials. I have a few questions as follows:

  1. Suppose I want to generate embeddings for GO Terms using opa2vec. In the original opa2vec paper, I need to provide two files, one containing the relationships between GO Terms (Classes), and another containing the annotation relationships between protein entities and GO Terms (Classes). In mOWL, as per my understanding, the ontology.owl file contains three types of information: 1) relationships between GO Terms (Classes), 2) annotation relationships between protein entities and GO Terms (Classes), and 3) interactions between protein entities. Additionally, the valid and test files both include test data for protein-protein interaction relationships, which are solely used for validating model performance. My question is: What's the difference between providing only the relationships between Classes and protein annotations (original OPA2Vec setup) and providing the extra PPI information? Is it just a difference in evaluation methodology, or would it impact the learning of Classes embeddings as well?

  2. Assuming I want to work with a customized dataset (specifying a version of GO relationships and other protein data), how would I construct owl-like formatted files? Are there any tutorials available for this?

ferzcam commented 1 year ago
  1. I do not know if it would impact the performance, but certainly they will be different experiments. In one case, the model will learn embeddings PPIs based on funciontal annotations of proteins and background information from GO. In the other case, (where you add PPIs information) the model will also take that into consideration for the learning. Unfortunately, I have not evaluated the difference and I cannot tell what the impact would be.

  2. You can take an ontology (like GO) and add more information to it following https://mowl.readthedocs.io/en/latest/ontology/index.html. There is also an explanation about creating ontologies from tsv files, which can be used to generate validation and testing ontologies. If you want to add more complex axioms to the ontology, then the way to go is to use the OWLAPI directly. One example of that is https://github.com/bio-ontology-research-group/mowl/blob/main/extra/create_dataset.py.

I hope this helps and if you have more questions please let me know.

CNwangbin commented 1 year ago

Thanks, it helps me a lot. I have resolved the problem.