dice-group / Ontolearn

Ontolearn is an open-source software library for explainable structured machine learning in Python. It learns OWL class expressions from positive and negative examples.
https://ontolearn-docs-dice-group.netlify.app/index.html
MIT License
36 stars 9 forks source link

Enriched KB invalid RDF/XML #27

Closed bigerl closed 4 years ago

bigerl commented 4 years ago

enriching a KB currently produces invalid RDF, e.g.

<owl:Class rdf:about="(Block1  ⊓  Punching)">
  <rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
  <owl:equivalentClass>
    <owl:Class>
      <owl:intersectionOf rdf:parseType="Collection">
        <rdf:Description rdf:about="http://siemens.com/knowledge_graph/cyber_physical_systems/sma/product#Block1"/>
        <rdf:Description rdf:about="#Punching"/>
      </owl:intersectionOf>
    </owl:Class>
  </owl:equivalentClass>
</owl:Class>

Looking at what the first line means:

<owl:Class rdf:about="#(Block1  ⊓  Punching)">

By RDF/XML definition it should be an URI not a string label:

7.2.24 Production aboutAttr

attribute(URI == rdf:about, string-value == URI-reference)

Semnatics are explained here: https://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-typed-nodes

All in all, that means, it parses to

<http://siemens.com/knowledge_graph/cyber_physical_systems/sma/processI-00076-ex#(Block1  ⊓  Punching)> rdfs:type owl:Class

And http://siemens.com/knowledge_graph/cyber_physical_systems/sma/processI-00076-ex#(Block1 ⊓ Punching) is no valid URI and thus it is invalid RDF.

bigerl commented 4 years ago

suggested solutions:

unique ID for class + Class expression as Label

<owl:Class rdf:about="#uniqueID12345">
  <rdf:label>(Block1  ⊓  Punching)</rdf:label>
  <rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
  <owl:equivalentClass>
    <owl:Class>
      <owl:intersectionOf rdf:parseType="Collection">
        <rdf:Description rdf:about="http://siemens.com/knowledge_graph/cyber_physical_systems/sma/product#Block1"/>
        <rdf:Description rdf:about="#Punching"/>
      </owl:intersectionOf>
    </owl:Class>
  </owl:equivalentClass>
</owl:Class>

URL/Percent encoding and keep the URI like it is

see https://tools.ietf.org/html/rfc3986#section-2.1

Demirrr commented 4 years ago

Please be aware of this most specifically section 8. That being said, if the suggestion satisfy both of you @bigerl, @renespeck, I could do the necessary modificaiton less than an hour.

However, would you please @bigerl elaborate on how to create a unique sequence of characters to create "#uniqueID12345 "?

  1. Is it a valid and unique URI ?
bigerl commented 4 years ago

However, would you please @bigerl elaborate on how to create a unique sequence of characters to create "#uniqueID12345 "?

1. Is it a valid and unique URI ?

lets say you have the string

x:str = "(Block1  ⊓  Punching)"

then a unique sequence of characters would be

url_suffix:str = "#{}".format(hash(x))

that should do it.

Where I am currently not completely sure is whether we need the "#" in the string or if it is already in the prefix. But that is easily found out when running it once through Jena. If it results in "##" being in the URI, we don't need it. Anyways, changing it would then be literally a once-character-change commit.

Demirrr commented 4 years ago

This issue has been solved as described in https://github.com/dice-group/OntoPy/issues/38. If the invalid serialization seem to occur again, please reopen this issue.