bio-ontology-research-group / mowl

mOWL: Machine Learning library with Ontologies
BSD 3-Clause "New" or "Revised" License
55 stars 4 forks source link

Run the Pretrained Model without a Validation Set or Test Set #79

Closed 20Bolin closed 1 month ago

20Bolin commented 1 month ago

Describe the bug

Hi, I tried to implement mowl on my ontology for the embedding. From what I see in the documentation, the validation and test ontology are just optional.

image

And I don't have validation set or test set at the moment, so I tried to run the model with my own training dataset.

However, it raised an error stating that "AttributeError: Validation dataset is None.", as shown in the screenshot below.

image image

I wonder whether it is impossible to run the model without validation ontology or test ontology. If so, how can I create those validation set or test set from the ontology I have at the moment? I have some ontology in the owl format similar to this:

<!-- http://www.opengis.net/ont/geosparql#Feature -->

<owl:Class rdf:about="http://www.opengis.net/ont/geosparql#Feature">
    <rdfs:subClassOf rdf:resource="http://www.opengis.net/ont/geosparql#SpatialObject"/>
    <owl:disjointWith rdf:resource="http://www.opengis.net/ont/geosparql#Geometry"/>
    <rdfs:isDefinedBy rdf:resource="http://www.opengis.net/ont/geosparql#"/>
    <rdfs:isDefinedBy rdf:resource="http://www.opengis.net/spec/geosparql/1.0/req/core/feature-class"/>
    <rdfs:isDefinedBy rdf:resource="http://www.opengis.net/spec/geosparql/1.1/req/core/feature-class"/>
    <skos:definition xml:lang="en">A discrete spatial phenomenon in a universe of discourse.</skos:definition>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.2.1"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.2.2"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.2.3"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.2.4"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.2.5"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.2.6"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.2.7"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.2.8"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.2.9"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.3.2"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.3.3"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.2.2"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.2.3"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.2.4"/>
    <skos:note xml:lang="en">A Feature represents a uniquely identifiable phenomenon, for example a river or an apple. While such phenomena (and therefore the Features used to represent them) are bounded, their boundaries may be crisp (e.g., the declared boundaries of a state), vague (e.g., the delineation of a valley versus its neighboring mountains), and change with time (e.g., a storm front). While discrete in nature, Features may be created from continuous observations, such as an isochrone that determines the region that can be reached by ambulance within 5 minutes.</skos:note>
    <skos:prefLabel xml:lang="en">Feature</skos:prefLabel>
</owl:Class>

How to reproduce

from tqdm import trange, tqdm
import torch as th
import torch.nn as nn
from mowl.datasets.base import PathDataset, Dataset, RemoteDataset, OWLClasses
from mowl.models.elembeddings.examples.model_ppi import ELEmPPI
from mowl.projection.factory import projector_factory

#Build a subclass of PathDataset to implement a customized evaluation_class
class CustomDataset(PathDataset):
    @property
    def evaluation_classes(self):
        """Classes that are used in evaluation
        """

        if self._evaluation_classes is None:
            gis = set()
            for owl_name, owl_cls in self.classes.as_dict.items():
                if "http://www.opengis" in owl_name:
                    gis.add(owl_cls)
            self._evaluation_classes = OWLClasses(gis), OWLClasses(gis)

        return self._evaluation_classes

ds = CustomDataset(ontology_path="opengis.owl")
dataset = ds

model = ELEmPPI(dataset,
                embed_dim=30,
                margin=0.1,
                reg_norm=1,
                learning_rate=0.001,
                epochs=20,
                batch_size=4096,
                model_filepath=r"C:\Users\Steven\mOWL",
                device='cpu')

# Set the number of individuals
model.module.ind_embed = nn.Embedding(num_embeddings=len(dataset.classes), embedding_dim=30)

# Training
model.train()

Environment

Windows 10 Python version 3.11.9 JDK version 11.0.22

Additional information

No response

ferzcam commented 1 month ago

Hi. Since you are using mowl.models.elembeddings.examples.model_ppi.ELEmPPI directly, validation loss is computed as part of the training loop. I would suggest to modify the ELEmPPI training loop yourself and remove/adapt the validation loss part. That would be removing/adapting the code starting at: https://github.com/bio-ontology-research-group/mowl/blob/466494d9c1d6db664a358452f6508772f8fbc172/mowl/models/elembeddings/examples/model_ppi.py#L54 until: https://github.com/bio-ontology-research-group/mowl/blob/466494d9c1d6db664a358452f6508772f8fbc172/mowl/models/elembeddings/examples/model_ppi.py#L64

Let me know if this helps.

20Bolin commented 1 month ago

Hi. Since you are using mowl.models.elembeddings.examples.model_ppi.ELEmPPI directly, validation loss is computed as part of the training loop. I would suggest to modify the ELEmPPI training loop yourself and remove/adapt the validation loss part. That would be removing/adapting the code starting at:

https://github.com/bio-ontology-research-group/mowl/blob/466494d9c1d6db664a358452f6508772f8fbc172/mowl/models/elembeddings/examples/model_ppi.py#L54

until: https://github.com/bio-ontology-research-group/mowl/blob/466494d9c1d6db664a358452f6508772f8fbc172/mowl/models/elembeddings/examples/model_ppi.py#L64

Let me know if this helps.

Hi! At the beginning, the error was raise in line 26 and said the dataset doesn't have evaluation_classes and this must be implemented in a child class. So I removed everything related to the variable "prots" and it caused the AttributeError I showed above. https://github.com/bio-ontology-research-group/mowl/blob/466494d9c1d6db664a358452f6508772f8fbc172/mowl/models/elembeddings/examples/model_ppi.py#L25 https://github.com/bio-ontology-research-group/mowl/blob/466494d9c1d6db664a358452f6508772f8fbc172/mowl/models/elembeddings/examples/model_ppi.py#L26

This time I simply removed line from 54 till 64, the training seems to work with any error messages. Even if I keep prots, no issues about the dataset.evaluation_classes were raised, which is good, but I don't quite understand why. Now another problem is that the training can be executed, but no saved model can be found in the model path I designated, which is quite weird. How can I fix this problem?

On the other hand, I don't think it makes enough sense to train without validation and test. So I also tried to build validation set and test set from the ontology I have right now. I tried to use Owlready2, but I don't know how to do it properly. The code I used is shown below:

from owlready2 import get_ontology, Thing, Ontology
import random

ontology = get_ontology("opengis.owl").load()
classes = list(ontology.classes())

# Shuffle the classes
random.shuffle(classes)

# Define the split ratios
train_ratio = 0.6
val_ratio = 0.2
test_ratio = 0.2

# Calculate split indices
train_end = int(train_ratio * len(classes))
val_end = train_end + int(val_ratio * len(classes))

# Split the classes
train_classes = classes[:train_end]
val_classes = classes[train_end:val_end]
test_classes = classes[val_end:]

# Initialize the split ontology with a unique IRI
train_ontology = get_ontology("urn:example:train_ontology")
val_ontology = get_ontology("urn:example:val_ontology")
test_ontology = get_ontology("urn:example:test_ontology")

# Function to add classes to a new ontology
def add_classes_to_ontology(classes, new_ontology):
    with new_ontology:
        for cls in classes:
            new_cls = type(cls.name, (Thing,), {})
            new_cls.is_a = cls.is_a

# Add classes to the respective ontologies
add_classes_to_ontology(train_classes, train_ontology)
add_classes_to_ontology(val_classes, val_ontology)
add_classes_to_ontology(test_classes, test_ontology)

# Save the ontologies to OWL files
train_ontology.save(file="opengis_train.owl", format="rdfxml")
val_ontology.save(file="opengis_val.owl", format="rdfxml")
test_ontology.save(file="opengis_test.owl", format="rdfxml")

But I don't think it's a proper way to do it, because in the example valid.owl and test.owl, there is a structured schema with object properties and classes while in the output of my method, I don't have it. Could you give me some advice on the train-test-split on ontology data? I asked it here but maybe it would be better to quote and raise another issue since it is a bit different from the previous question.

20Bolin commented 1 month ago

My ontology is also in owl and I want the validation and test also in owl and have a structured schema like:

    <!-- 
    ///////////////////////////////////////////////////////////////////////////////////////
    //
    // Object Properties
    //
    ///////////////////////////////////////////////////////////////////////////////////////
     -->

    <!-- http://has_function -->

    <owl:ObjectProperty rdf:about="http://has_function"/>

    <!-- http://has_label -->

    <owl:ObjectProperty rdf:about="http://has_label"/>

    <!-- http://interacts_with -->

    <owl:ObjectProperty rdf:about="http://interacts_with"/>

    <!-- http://purl.obolibrary.org/obo/BFO_0000050 -->

    <owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/BFO_0000050">
        <owl:inverseOf rdf:resource="http://purl.obolibrary.org/obo/BFO_0000051"/>
        <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#TransitiveProperty"/>
        <oboInOwl:hasDbXref>BFO:0000050</oboInOwl:hasDbXref>
        <oboInOwl:hasOBONamespace>external</oboInOwl:hasOBONamespace>
        <oboInOwl:id>part_of</oboInOwl:id>
        <oboInOwl:shorthand>part_of</oboInOwl:shorthand>
        <rdfs:label>part of</rdfs:label>
    </owl:ObjectProperty>

    <!-- 
    ///////////////////////////////////////////////////////////////////////////////////////
    //
    // Classes
    //
    ///////////////////////////////////////////////////////////////////////////////////////
     -->

    <!-- http://4932.Q0010 -->

    <owl:Class rdf:about="http://4932.Q0010">
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://interacts_with"/>
                <owl:someValuesFrom rdf:resource="http://4932.Q0017"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://interacts_with"/>
                <owl:someValuesFrom rdf:resource="http://4932.Q0032"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://interacts_with"/>
                <owl:someValuesFrom rdf:resource="http://4932.Q0092"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://interacts_with"/>
                <owl:someValuesFrom rdf:resource="http://4932.Q0142"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://interacts_with"/>
                <owl:someValuesFrom rdf:resource="http://4932.Q0182"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://interacts_with"/>
                <owl:someValuesFrom rdf:resource="http://4932.Q0297"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://interacts_with"/>
                <owl:someValuesFrom rdf:resource="http://4932.YDL114W"/>
            </owl:Restriction>
        </rdfs:subClassOf>
    </owl:Class>

    <!-- http://4932.Q0017 -->

    <owl:Class rdf:about="http://4932.Q0017">
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://interacts_with"/>
                <owl:someValuesFrom rdf:resource="http://4932.Q0010"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://interacts_with"/>
                <owl:someValuesFrom rdf:resource="http://4932.Q0032"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://interacts_with"/>
                <owl:someValuesFrom rdf:resource="http://4932.Q0092"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://interacts_with"/>
                <owl:someValuesFrom rdf:resource="http://4932.Q0142"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://interacts_with"/>
                <owl:someValuesFrom rdf:resource="http://4932.Q0143"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://interacts_with"/>
                <owl:someValuesFrom rdf:resource="http://4932.Q0182"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://interacts_with"/>
                <owl:someValuesFrom rdf:resource="http://4932.Q0297"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://interacts_with"/>
                <owl:someValuesFrom rdf:resource="http://4932.YDL114W"/>
            </owl:Restriction>
        </rdfs:subClassOf>
    </owl:Class>