KRR-Oxford / DeepOnto

A package for ontology engineering with deep learning and language models.
https://krr-oxford.github.io/DeepOnto/
Apache License 2.0
187 stars 12 forks source link

Tokenizer error "list index out of range" during mapping extension #10

Closed Danysan1 closed 1 year ago

Danysan1 commented 1 year ago

Describe the bug Under some circumstances dureing the mapping extensions stage the tokenizer throws the error IndexError: list index out of range. The error originates at bert_classifier.py line 185. This is the same error and same location inside the tokenizer of https://github.com/huggingface/tokenizers/issues/993 , which was caused by the data passed to the tokenizer.

To Reproduce I have reproduced this error with these settings:

Logs & stack trace max_length_for_input batch_size_for_training Source ontology Target ontology
link 256 16 music-representation.owl musicClasses.owl @ 2ebb641
link 128 8 core.owl musicClasses.owl @ ebc2d09

Expected behavior The stage and the pipeline should complete successfully

Platform:

Lawhy commented 1 year ago

Hi @Danysan1, thanks for reporting this. Before I conduct a thorough check, may I ask if every class of your ontologies has at least one label available? I suspect this issue is caused by an empty list of class names.

Also, the default BERTMap configuration might not include necessary annotation properties available in your ontologies.

Danysan1 commented 1 year ago

Yes, there were some classes without label. I have fixed them and the pipeline completed succesfully.

It would be ideal to check if this error is present and print an explicit error before calling the tokenizer.

Similarly, an unclear error is thrown in text_semantics.py line 232 if the passed ontology has no subClassOf relationships. It would be ideal to receive an explicit error.

Lawhy commented 1 year ago

Sure, I will update this in the next release. Thanks for your feedback.