Design review about the OML limitations on annotation/data property values

OML is a language of patterns of OWL2-DL + SWRL constructs to support integration of systems engineering vocabularies, models conforming to such vocabularies and data about such models.

In practice, this kind of integration needs to support versioning (vocabularies, models and data grow and evolve over time) and decentralized collaboration. Although GIT is an excellent choice for a Version Control System (VCS) for storing this kind of information, it was unclear which language would be suitable to use for representing this information.

For several years, OWL has been plagued with problems of serialization variability and non-determinism. Despite recent improvements in the OWL2 API (see https://github.com/owlcs/owlapi/issues/702), there still remains serialization variability problems, particularly with respect to the order of axiom annotations in RDF/XML. Serialization formats that represent explicitly so-called "blank nodes" pose even more challenges because there are no constraints on the blank node identifiers except that they must be unique within a given document serialization.

The design of OML as a 4th-normal form relational database schema provided a framework to ensure deterministic and reproducible identification and serialization.

Since every OML schema table has a single, globally unique, Version 5 UUID primary key, deterministic serialization is easily solved:

sort the serialization of OML tables by table name
within each table, sort the serialization of table rows by primary UUID key

To achieve the deterministic identification objective, the Version 5 UUID primary key of each OML table is computed from a tuple of identification criteria where the tuple must be semantically a globally unique identifier.

What is a semantically global and unique identifier for each OML table?

This is where design decisions were made in OML.

1) Identification of OML Modules

An OML Module is a graph of logical assertions that maps to an OWL2-DL Ontology. Just like an OWL2-DL Ontology is identified by its IRI, an IRI is the identification criteria for an OML Module.

2) Identification of OML Terms

An OML Term corresponds to a pattern anchored by kind of OWL2-DL Entity (Class, Object Property, Data Property, Datatype). OML uses a convention adopted in JPL's IMCE project where an OWL2-DL Entity e declared in an OWL2-DL Ontology with IRI i has IRI: i#n where n is a fragment name that uniquely identifies this entity in the scope of that ontology.

OML adopts this convention for the identification criteria of an OML Term as the tuple of the defining OML Module and of the OML Term's name.

3) Identification of logical and non-logical axioms about entities.

This kind of identification requires making design choices...

Examples:

3a) Specialization axioms are identified by a unique triple of UUID cross-references:

The OML TerminologyBox where the axiom is asserted
The sub-entity asserted to be a specialization child of the super-entity
The super-entity asserted to be a specialization parent of the sub-entity.

This identification is consistent with the semantics of specialization axioms in OWL.

3b) Annotation axioms are identified by a unique pair of UUID (cross-references):

The annotated subject
The annotation property.

This identification is problematic; it should be instead a triple of UUID cross-references:
The OML Module where the axiom is asserted
The annotated subject
The annotation property

The pair-based identification does not allow different OML Modules to assert different annotation property values for the same subject.

The triple-based identification allows the same subject to have different annotation property values in different OML modules. However, within a given OML module, a given subject can only have one annotation property value for a given annotation property.

Note that it is very important to avoid including the annotation value in the identification of the annotation property value axiom. This simplifies the change management problem in a way that VCS systems like GIT can handle trivially:
- Creating/deleting an annotation property value axiom results in a corresponding addition/deletion of a row in the OML table of AnnotationPropertyValues.
- Changing the value of an OML AnnotationPropertyValue axiom does not change its identity; therefore, the change appears as deleting the current row and creating a new row where both rows have the same identity.

If the annotation value had been included in the identity of the OML AnnotationPropertyValue axiom, change management would become a non-trivial issue for VCS systems like GIT.

From a practical standpoint, the triple-based identification criteria for OML AnnotationPropertyValues has an important consequence: a given OML module cannot have multiple annotation property values for the same pair of subject/annotation property.

For example, the following is ill-formed:

open terminology <http://example.org/A> {

  extends <http://purl.org/dc/elements/1.1>

  dc:description="This is a brief description...."
  dc:description="This is a somewhat more elaborate description..."
  concept Foo
}

This example illustrates the following problem:

In OWL, annotations are semantically unordered; however, their serialization is syntactically ordered; is this ordering deterministic and reproducible?
If the annotation value were included in the identity, then annotation property values could be ordered deterministically and reproducibly; however, this order would be very fundamentally sensitive to the actual annotation value. This means that a simple change of 1 annotation property value axiom could have undesirable consequences on the ordering of other annotation property value axioms. It also means that it becomes more difficult to recognize simple changes because there is no longer a means for identifying the same annotation property value axiom before and after a change to the value.
If the annotation value is not included in the identity, then annotation property values within the same OML module cannot be deterministically ordered if they have the same pair of subject & annotation property.

A possible way to enable support for multiple annotation property values for the same pair of subject & annotation property within the same OML module could be to augment the syntax of an annotation property value to include an index to force an a-priori syntactic order of such axioms within an OML module for the same pair of subject & annotation property.

3c) Values of scalar data properties are identified by a unique triple of UUID (cross-references):

The OML DescriptionBox where the axiom is asserted
The OML ConceptualEntitySingletonInstance
The OML EntityScalarDataProperty

Such axiom translates in OWL as a Positive Data Assertion Axiom. Since the semantics of a non-functional OWL data property are unclear, it is unclear what practical utility there could be for supporting multiple such axioms in the same OML DescriptionBox for the same pair of OML ConceptualEntitySingletonInstance and OML EntityScalarDataProperty.

However, if the intent is to represent some kind of collection of values, then a more effective way to do this is either to model the collection explicitly via an OML Structure.

JPL-IMCE / gov.nasa.jpl.imce.oml

Design review about the OML limitations on annotation/data property values #162