dice-group / Ontolearn

Ontolearn is an open-source software library for explainable structured machine learning in Python. It learns OWL class expressions from positive and negative examples.
https://ontolearn-docs-dice-group.netlify.app/index.html
MIT License
41 stars 9 forks source link

Missed training leads to error when the next request is received #400

Open MichaelRoeder opened 6 months ago

MichaelRoeder commented 6 months ago

Error description

If a training attempt fails, following requests with the same pre-trained model path lead to an exception. Steps to recreate it:

  1. Send a (slightly) faulty request that contains a path to a pre-trained model that does not exist and that contains a small error (e.g., a wrong path to the embeddings)
    curl -X 'GET' -H 'accept: application/json' -H 'Content-Type: application/json' --data '{"pos":["http://www.wikidata.org/entity/Q3895","http://www.wikidata.org/entity/Q180855"], "neg":["http://www.wikidata.org/entity/Q483915","http://www.wikidata.org/entity/Q1359568","http://www.wikidata.org/entity/Q169167","http://www.wikidata.org/entity/Q192334","http://www.wikidata.org/entity/Q695087","http://www.wikidata.org/entity/Q20165"], "model":"Drill","path_embeddings":"/data/output/Keci_entity_embeddings.csv", "path_embeddings": "mutagenesis_embeddings/Keci_entity_embeddings.csv", "path_to_pretrained_drill": "pretrained_drill",  "num_of_training_learning_problems": 10, "num_of_target_concepts": 3, "max_runtime": 60000, "iter_bound": 100 }' http://localhost:8000/cel
  2. This will lead to a situation in which the server decides to skip the training. However, internally, it will create the path to the pre-trained model. From the server log:
    
    ######### CEL Arguments ###############
    Knowledgebase/Triplestore:<ontolearn.triple_store.TripleStore object at 0x78ef01cd7af0>
    Input data: {'pos': ['http://www.wikidata.org/entity/Q3895', 'http://www.wikidata.org/entity/Q180855'], 'neg': ['http://www.wikidata.org/entity/Q483915', 'http://www.wikidata.org/entity/Q1359568', 'http://www.wikidata.org/entity/Q169167', 'http://www.wikidata.org/entity/Q192334', 'http://www.wikidata.org/entity/Q695087', 'http://www.wikidata.org/entity/Q20165'], 'model': 'Drill', 'path_embeddings': 'mutagenesis_embeddings/Keci_entity_embeddings.csv', 'path_to_pretrained_drill': 'pretrained_drill', 'num_of_training_learning_problems': 10, 'num_of_target_concepts': 3, 'max_runtime': 60000, 'iter_bound': 100}
    ######### CEL Arguments ###############
    No pre-trained model...
    No loading because embeddings not provided
    Learning OWL Class Expression at most 100 iteration:   0%|                                                                                                                          | 0/100 [00:00<?, ?it/s]
    ######## Current Search Tree 11 ###########

... answering the request continues as usual from here on ...

3. As a user, you may find the error and correct it. So we send the same request as the one above but with the corrected `path_embeddings` value.
4. The server throws an exception because the directory exists:

root@96b21be83ded:/# cd pretrained_drill/ root@96b21be83ded:/pretrained_drill# ls seen_examples.json

However, there is no `pth` file in it:

######### CEL Arguments ############### Knowledgebase/Triplestore:<ontolearn.triple_store.TripleStore object at 0x78ef01cd7af0> Input data: {'pos': ['http://www.wikidata.org/entity/Q3895', 'http://www.wikidata.org/entity/Q180855'], 'neg': ['http://www.wikidata.org/entity/Q483915', 'http://www.wikidata.org/entity/Q1359568', 'http://www.wikidata.org/entity/Q169167', 'http://www.wikidata.org/entity/Q192334', 'http://www.wikidata.org/entity/Q695087', 'http://www.wikidata.org/entity/Q20165'], 'model': 'Drill', 'path_embeddings': '/data/output/Keci_entity_embeddings.csv', 'path_to_pretrained_drill': 'pretrained_drill', 'num_of_training_learning_problems': 10, 'num_of_target_concepts': 3, 'max_runtime': 60000, 'iter_bound': 1} ######### CEL Arguments ############### INFO: 127.0.0.1:52386 - "GET /cel HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi result = await app( # type: ignore[func-returns-value] File "/usr/local/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call return await self.app(scope, receive, send) File "/usr/local/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/usr/local/lib/python3.10/site-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/usr/local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/usr/local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 776, in app await route.handle(scope, receive, send) File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 72, in app response = await func(request) File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(**values) File "/Ontolearn/ontolearn/scripts/run.py", line 91, in cel owl_learner = get_learner(data) File "/Ontolearn/ontolearn/scripts/run.py", line 74, in get_learner return get_drill(data) File "/Ontolearn/ontolearn/scripts/run.py", line 58, in get_drill drill.load(directory=data["path_to_pretrained_drill"]) File "/Ontolearn/ontolearn/learners/drill.py", line 252, in load self.heuristic_func.net.load_state_dict(torch.load(directory + "/drill.pth", torch.device('cpu'))) File "/usr/local/lib/python3.10/site-packages/torch/serialization.py", line 791, in load with _open_file_like(f, 'rb') as opened_file: File "/usr/local/lib/python3.10/site-packages/torch/serialization.py", line 271, in _open_file_like return _open_file(name_or_buffer, mode) File "/usr/local/lib/python3.10/site-packages/torch/serialization.py", line 252, in init super().init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: 'pretrained_drill/drill.pth'

Demirrr commented 5 months ago

Dear @MichaelRoeder

Thank you for opening an issue with through details :+1:

Given that path_embeddings does not lead to a CSV file corresponding to the entity embeddings, the created folder named path_to_pretrained_drill only contains seen_examples.json. Therefore, since pretrained_drill/drill.pth is not found although pretrained_drill is created,FileNotFoundError has been thrown.

My question is How would you like the system to behave in the aforemented scenario?

MichaelRoeder commented 5 months ago

The main issue is that the service gets stuck in a state, in which I cannot use it anymore. (Well, I could use a different pretrained path but it may take too much time until a user like me figures that out... :sweat_smile: )

If I understand the workflow correctly, the program decides whether to train or not based on the existence of the given pretrained_drill path. An easy solution would be, to extend this decision as follows: start the training if either the directory does not exist OR if the directory does not contain a model file.

However, there are other solutions possible (e.g., get better users :wink: )