JPL-IMCE / gov.nasa.jpl.imce.oml

Ontology Modeling Language (OML) Workbench
14 stars 1 forks source link

Design review about the OML limitations on annotation/data property values #162

Closed NicolasRouquette closed 6 years ago

NicolasRouquette commented 7 years ago

OML is a language of patterns of OWL2-DL + SWRL constructs to support integration of systems engineering vocabularies, models conforming to such vocabularies and data about such models.

In practice, this kind of integration needs to support versioning (vocabularies, models and data grow and evolve over time) and decentralized collaboration. Although GIT is an excellent choice for a Version Control System (VCS) for storing this kind of information, it was unclear which language would be suitable to use for representing this information.

For several years, OWL has been plagued with problems of serialization variability and non-determinism. Despite recent improvements in the OWL2 API (see https://github.com/owlcs/owlapi/issues/702), there still remains serialization variability problems, particularly with respect to the order of axiom annotations in RDF/XML. Serialization formats that represent explicitly so-called "blank nodes" pose even more challenges because there are no constraints on the blank node identifiers except that they must be unique within a given document serialization.

The design of OML as a 4th-normal form relational database schema provided a framework to ensure deterministic and reproducible identification and serialization.

Since every OML schema table has a single, globally unique, Version 5 UUID primary key, deterministic serialization is easily solved:

To achieve the deterministic identification objective, the Version 5 UUID primary key of each OML table is computed from a tuple of identification criteria where the tuple must be semantically a globally unique identifier.

What is a semantically global and unique identifier for each OML table?

This is where design decisions were made in OML.

1) Identification of OML Modules

An OML Module is a graph of logical assertions that maps to an OWL2-DL Ontology. Just like an OWL2-DL Ontology is identified by its IRI, an IRI is the identification criteria for an OML Module.

2) Identification of OML Terms

An OML Term corresponds to a pattern anchored by kind of OWL2-DL Entity (Class, Object Property, Data Property, Datatype). OML uses a convention adopted in JPL's IMCE project where an OWL2-DL Entity e declared in an OWL2-DL Ontology with IRI i has IRI: i#n where n is a fragment name that uniquely identifies this entity in the scope of that ontology.

OML adopts this convention for the identification criteria of an OML Term as the tuple of the defining OML Module and of the OML Term's name.

3) Identification of logical and non-logical axioms about entities.

This kind of identification requires making design choices...

Examples:

3a) Specialization axioms are identified by a unique triple of UUID cross-references:

3b) Annotation axioms are identified by a unique pair of UUID (cross-references):

If the annotation value had been included in the identity of the OML AnnotationPropertyValue axiom, change management would become a non-trivial issue for VCS systems like GIT.

From a practical standpoint, the triple-based identification criteria for OML AnnotationPropertyValues has an important consequence: a given OML module cannot have multiple annotation property values for the same pair of subject/annotation property.

For example, the following is ill-formed:

open terminology <http://example.org/A> {

  extends <http://purl.org/dc/elements/1.1>

  dc:description="This is a brief description...."
  dc:description="This is a somewhat more elaborate description..."
  concept Foo
}

This example illustrates the following problem:

A possible way to enable support for multiple annotation property values for the same pair of subject & annotation property within the same OML module could be to augment the syntax of an annotation property value to include an index to force an a-priori syntactic order of such axioms within an OML module for the same pair of subject & annotation property.

3c) Values of scalar data properties are identified by a unique triple of UUID (cross-references):

Such axiom translates in OWL as a Positive Data Assertion Axiom. Since the semantics of a non-functional OWL data property are unclear, it is unclear what practical utility there could be for supporting multiple such axioms in the same OML DescriptionBox for the same pair of OML ConceptualEntitySingletonInstance and OML EntityScalarDataProperty.

However, if the intent is to represent some kind of collection of values, then a more effective way to do this is either to model the collection explicitly via an OML Structure.

NicolasRouquette commented 6 years ago

Partially resolved in #166. The remaining pertains to #174