fieldenms / tg

Trident Genesis
MIT License
14 stars 7 forks source link

Graph model for domain metadata #2251

Open homedirectory opened 6 months ago

homedirectory commented 6 months ago

Description

In TG there is a subsystem responsible for generation of domain metadata. It was created specifically for EQL.

EQL3 metadata has the following structure:

eql3-metadata-full

This structure resembles a graph, although it is more akin to a forest.

There are several places where it is replicated:

Each implementation of the structure is different and, as a result, supports only specialised operations. One consequence is that more effort is required to understand, use and maintain them.

It is proposed to introduce a single abstraction for modelling entities and their properties -- a graph. Graphs are well-established data structures, and there are Java libraries that provide them:

The previous example could be modelled with a graph as follows:

graph-metadata-full

This structure has the following differences:

Note that such graphs could easily accomodate loops and directed cycles.

The advantages of this abstraction is the ability to reduce all operations to graph traversal, which should lend itself nicely to reuse and provide clarity of expression.

Expected outcome

A graph model for domain metadata is evaluated and adopted if deemed to be fitting and:

homedirectory commented 6 months ago

This is an attempt to answer the following questions:

  1. How to structure top-level metadata?
    • Graph
    • Forest
  2. How to structure property metadata?
    • Should it include the enclosing type?
    • Should it include the property type?

The following criteria should be used to compare alternatives:

  1. Space complexity of metadata structures.
  2. Time complexity of metadata operations.
  3. Metadata interface -- simplicity is desirable.

Metadata Overview

In general, metadata describes:

Metadata Structure Options

Type metadata can be expressed with the following ADT:

TypeMetadata = EntityMetadata 
             | CompositeTypeMetadata 
             | PrimitiveTypeMetadata 
             | CollectionalTypeMetadata
             | OtherTypeMetadata

Let Tn be the total number of types and Pn the average number of properties in a type.

Options for TypeMetadata structure (referred to as TMn):

  1. TM1 Stores a set of PropertyMetadata (applicable only to [Entity,CompositeType]Metadata).

    This may not necessarily be a set but any data structure.

    Let TM1_S = O(Tn * Pn) be the space required to store properties.

    Let TM1_Pt be the time required to retrieve a property given this structure. Most likely Map String PropertyMetadata will be used, thus TM1_Pt = O(1).

Options for PropertyMetadata structure (referred to as PMn):

  1. PM1 Stores enclosing EntityMetadata in a field.

    Increases space complexity by PM1_S = Tn * Pn.

  2. PM2 Stores property type TypeMetadata in a field.

    Increases space complexity by PM2_S = Tn * Pn.

Metadata Operations

Assume that DomainMetadata is the entry point for accessing metadata.

Operations on metadata (referred to as OPn):

  1. OP1 Retrieve type metadata by class.

    DomainMetadata.typeMetadata :: Class -> TypeMetadata
  2. OP2 Retrieve property metadata given its enclosing type and name.

    If TM1, then:

    EntityMetadata.propertyMetadata :: String -> PropertyMetadata
    CompositeTypeMetadata.propertyMetadata :: String -> PropertyMetadata

    Otherwise:

    DomainMetadata.propertyMetadata :: EntityMetadata -> String -> PropertyMetadata
    DomainMetadata.propertyMetadata :: CompositeTypeMetadata -> String -> PropertyMetadata
  3. OP3 Access enclosing type of a property.

    If PM1, then:

    PropertyMetadata.enclosingType :: EntityMetadata | CompositeTypeMetadata

    Otherwise:

    DomainMetadata.enclosingType :: PropertyMetadata -> EntityMetadata | CompositeTypeMetadata
  4. OP4 Access the type of a property.

    If PM2, then:

    PropertyMetadata.type :: TypeMetadata

    Otherwise:

    DomainMetadata.propertyType :: PropertyMetadata -> TypeMetadata

TM1, PM1 and PM2 share the following:

Top-level metadata structure

In describing how each structure performs operations outlined above it is assumed that neither of PM1, PM2, TM1 are present, as their presence would remove the need for the top-level metadata structure to support these operations.

Graph

Internal representation:

Operations:

  1. OP1 Achievable by additionally introducing Map Class TypeMetadata.
  2. OP2 Achievable by: a. Retrieve [Entity,CompositeType]Metadata through OP1 if not given it directly. b. Find PropertyMetadata with the given name among outgoing edges.
  3. OP3 Find source of the arc.
  4. OP4 Find target of the arc.

Criteria:

  1. Space complexity: A * (Tn * Pn) + Tn + M.

    A -- space required for a single arc:

    • source :: TypeMetadata
    • target :: TypeMetadata
    • value :: PropertyMetadata

    M = Tn * 2 -- space required for efficient OP1.

    Lower bound: O(A * (Tn * Pn) + Tn). Upper bound: O(A * (Tn * Pn) + Tn + PM1_S + PM2_S + TM1_S).

    Leaves choices of TM1, PM1, PM2 open.

  2. Time complexity. Let Nt and Et be the time required to find a node and an arc in a graph respectively.
    • OP1 -- O(1)
    • OP2
    • With TM1: O(1) + TM1_Pt
    • Without TM1: O(1) + Nt + O(Pn / 2)
    • OP3
    • With PM1: O(1)
    • Without PM1: O(1) + Et + O(1)
    • OP4
    • With PM2: O(1)
    • Without PM2: O(1) + Et + O(1)

Forest

Internal representation: Set TypeMetadata. This representation requires TM1 to be able to access properties from types.

Operations:

  1. OP1 Achievable by changing the internal representation to Map Class TypeMetadata.
  2. OP2 Achievable by: a. Retrieve [Entity,CompositeType]Metadata through OP1 if not given it directly. b. Find PropertyMetadata with the given name in the set of properties.
  3. OP3 Search through all TypeMetadata for a type that encloses the given property.

    Inefficient without PM1.

  4. OP4 Impossible without PM2.

Criteria:

  1. Space complexity: D + K + L.

    D -- space required for the top-level structure:

    • Set TypeMetadata: Tn.
    • Map Class TypeMetadata: Tn * 2.

    K -- optional overhead that depends on the structure of property metadata. Given that we want both effiency of OP3 and possibility of OP4, K = PM1_S + PM2_S.

    L -- space required by TM1 to make this data structure work at all. L = TM1_S

    Lower bound: O(Tn + PM1_S + PM2_S + TM1_S). Upper bound: O(Tn + PM1_S + PM2_S + TM1_S).

    Strictly requires TM1, PM2. Requires PM1 for efficiency of OP3.

  2. Time complexity:

Conclusion