etherlabsio / ai-engine

Core AI services and functions powering the ETHER Platform
MIT License
0 stars 0 forks source link

Generate Schema and indices for Dgraph from python classes #191

Closed shashankpr closed 4 years ago

shashankpr commented 4 years ago

This PR aims at handling dgraph schema spec uniformity by generating dgraph-compliant schema and indices. This work is requried since:

  1. Dgraph schema types are constantly updated with new version bumps. Manual maintenance could create potential issues with backups and backward portability
  2. Generating schema from python classes would require only addition of new predicates in class definitions thus preventing maintenance of multiple definitions
  3. Easier while scaling up
  4. Simpler to version the schema file
  5. Can be quickly updated/downgraded as per new Dgraph schema rules

Updated Changes:

Pending Changes:

Design thoughts

  1. Index rules need to be defined alongwith class definitions so that there is no ambiguity or misrepresentation of predicates
  2. There can be some predicates/relations that are reused across various nodes. For e.g Keyphrase node has a predicate called type which indicates the type of keyphrase. Similarly, Mind node also has type with different input but of same type (string). Hence, it is important that while defining index rules we do not end up defining different index types for the same predicates.
  3. Separate TypeDef generation and Index generation since not always both need to be updated. Also, TypeDefs are more prone to changes from Dgraph team than schema. Keeping them separate is easier to maintain code.

Implementation detail

dgconfig (DgMeta struct)

dgconfig(
    ignore_schema: bool = False,
    index: bool = True,
    index_type: List = None,
    dg_field: str = None,
    directive: List = None,
    field_name: str = None,
) -> Dict[str, dict]:

Example dgconfig representation

{
    "dgraph": 
        {
            "field_name": "xid",
            "ignore_schema": False,
            "index": True,
            "index_type": ["term", "exact", "month"],
            "dg_field": "string"/"dateTime",
            "directive": ["@reverse", "@upsert"]
        }
     "dataclasses_json": (some other 3rd party app)
}

Output from TypeDef and Index Generator

# Type Def
type Keyphrase {
    xid: string
}

# Index
xid: string @index(term, exact, month) @reverse @upsert .

Defining DgMeta in dataclasses

@dataclass_json
@dataclass(order=True)
class Keyphrase(DgraphAttributes):
    xid: str = field(default="", 
                      metadata=dgconfig(
                        index=True, 
                        index_type=["term", "exact"]
                    ))
    related_to_keyphrase: bool = field(
       init=False, 
       default=False, 
       metadata=dgconfig(index=False)
    )