Approach to auto-generate the GraphQL API schema from the DB schema

In the Athens meeting we agreed that

we will use a GraphQL API for the SPIRIT components to access the content database (DB), and
we will use the GraphQL schema language for defining the schema of the content DB.

For point 1 (the GraphQL API) we need a GraphQL schema. Notice that, conceptually, this schema is something else than the schema we are creating for the content DB as per point 2. More precisely, while the schema for the content DB defines what exactly the objects look like that we store in the content DB, the schema for the API defines how these objects can be queried via the GraphQL API (i.e., what data can be requested for these objects) and how these objects can be inserted and modified via the GraphQL API. Essentially, the schema for the GraphQL API needs to contain some more things than what we have in the schema for the DB. A natural question at this point is: How do we get to this schema for the API?

The answer is to generate this schema automatically from the DB schema! The advantage of this approach is that we do not need to do any manual work for creating the API schema. Instead, that schema is simply generated by pushing a button (or, calling a command-line tool, to be more precise ;) Moreover, if we later extend or modify the DB schema, we can easily generate an extended API schema that reflects the changes of the DB schema. Now, the question is: How is the API schema generated from the DB schema?

We (LiU) will define an approach to generate the API schema from the DB schema, and we will develop the tool that implements this approach. In the following, I provide an overview of the approach.

The idea of the approach to generate the API schema from the DB schema is to copy the DB schema into a new file and, then, extend the schema in this new file with all the additional things needed for the API schema. These additional things needed are:

an ID field in every object type that enables the GraphQL queries to access the system-generated identifier of each object,
a query type that specifies the starting points of queries sent to the GraphQL API,
additional fields in the object types that enable the GraphQL queries to traverse relationships between the objects in the reverse direction,
additional fields in the object types that enable the GraphQL queries to access the data associated with these relationships, and
a mutation type and corresponding input types that specify how data can be inserted and modified via the GraphQL API.

1. ID Fields

When inserting a data object into the database, the database management system (ArangoDB in our case) generates an identifier for it. While these identifiers do not need to be part of the DB schema, they should be contained in the schema for the GraphQL API so that they can be requested in GraphQL queries (and, then, used later in subsequent queries). Therefore, when extending the DB schema into the schema for the GraphQL API, each object type is augmented with a field named ID whose value type is ID!.

2. Query Type

Every GraphQL schema for a GraphQL API must have one special type called the query type. The schema for the DB does not need such a query type and, in fact, it should not contain one. The purpose of the query type is to specify the possible starting points of any kind of query that can be sent to the API. For instance, consider the following snippet of a GraphQL API schema which defines the query type of the corresponding API.

type Query {
    Investigation(ID:ID!): Investigation
}

Based on this query type, it is (only) possible to write queries (API requests) that start from an Investigation object specified by a given ID. For instance, it is possible to write the following query.

query {
    Investigation(ID:371) {
        Title
        Description
        Authorization {
            SearchPurpose
            Necessity
        }
    }
}

However, with a query type like the one above, it would not be possible to query directly for, say, an Authorization object specified by its ID.

Now, when extending the DB schema into the API schema, the plan is to generate a query type that contains two fields (i.e., starting points for queries) for every object type in the DB schema: one of these fields can be used to query for one object of the type based on the ID of that object, and the second field can be used to access a paginated list of all objects of the corresponding type. The list is paginated, which means that it can be accessed in chunks.

For example, for the Investigation type that we have in our DB schema, the generated query type would contain the following two fields.

    Investigation(ID:ID!): Investigation
    ListOfInvestigations(first:Int after:ID): ListOfInvestigations!

The additional type called ListOfInvestigations that is used here will be defined as follows.

type ListOfInvestigations {
    totalCount: Int
    isEndOfWholeList: Boolean
    content: [Investigation]
}

Then, it will be possible to write queries such as the following.

query {
    ListOfInvestigations(first:10 after:371) {
        totalCount
        isEndOfWholeList
        content {
            Title
            Description
            Authorization {
                SearchPurpose
            }
        }
    }
}

3. Additional Fields For Traverssal

In the DB schema, each type of relationships (edges) between objects of particular types is defined only in one of the two related object types. For instance, consider the two object types Investigation and Authorization whose definition in the DB schema looks as follows.

type Investigation {
    UserID: ID!
    Title: String!
    Description: String!
    CaseNumber: String!
    Authorization: Authorization
    Searches: [Search]
    Created: Date!
    Hide: Boolean!
}

type Authorization {
    SearchPurpose: String!
    Necessity: [Necessity]
    Proportionality: [Proportionality]
    TrainedAndAuthorized: Boolean  
    DarkWeb: Boolean!
    AuthBy: Authority!
    Created: Date!
}

Notice that the relationship (i.e., the possible edges) between Investigation objects and Authorization objects are defined only in the definition of the type Investigation (see the field named Authorization) but not in the type Authorization. Specifying every edge type only once is sufficient for the purpose of defining the schema of a (graph) database. However, it is not sufficient for supporting bidirectional traversal of these edges in GraphQL queries. Hence, the schema for the API needs to mention possible edges twice; that is, in both of the corresponding object types. For the aforementioned example of the relationships between Investigation objects and Authorization objects, the API schema, thus, needs to contain an additional field in the type Authorization such that this field can be used to query from an Authorization object to the Investigation objects that point to it via their Authorization fields. Hence, when extending the aforementioned part of DB schema into the schema for the GraphQL API, the definition of the Authorization type will be extended as follows.

type InvAuthorization {
    ID: ID!
    SearchPurpose: String!
    Necessity: [Necessity]
    Proportionality: [Proportionality]
    TrainedAndAuthorized: Boolean  
    DarkWeb: Boolean!
    AuthBy: Authority!
    Created: Date!
    Investigation: [Investigation]
}

Observe that the value type of the added field named Investigation is a list of Investigation objects. This is because, according to the DB schema, multiple different Investigation objects may point to the same Authorization object; i.e., the relationship between Investigation objects and Authorization objects is a many-to-one relationship (N:1). Therefore, from an Authorization object, we may come to multiple Investigation objects.

Perhaps this was not the intention and, instead, the relationship between Investigation objects and Authorization objects was meant to be a one-to-one relationship. This could have been captured by adding the @uniqueForTarget directive to the field named Authorization in the DB schema (as described in the text before Example 7 of http://blog.liu.se/olafhartig/documents/graphql-schemas-for-property-graphs/). Assuming that there would be such a @uniqueForTarget directive, then the new field named Investigation that is added when extending the DB schema into the API schema would be defined differently:

type InvAuthorization {
    ID: ID!
    SearchPurpose: String!
    Necessity: [Necessity]
    Proportionality: [Proportionality]
    TrainedAndAuthorized: Boolean  
    DarkWeb: Boolean!
    AuthBy: Authority!
    Created: Date!
    Investigation: Investigation
}

This example demonstrates that the exact definition of the fields that are added when extending the DB schema into the API schema depends on the constraints that are captured by directives in the DB schema. To elaborate a but further on this point, let us assume that the aforementioned field named Authorization in the DB schema would additionally be annotated with the @requiredForTarget directive (in addition to the @uniqueForTarget directive). In this case, the extension of the type Authorization for the API schema would look as follows (notice the additional exclamation mark at the end of the value type for the new Investigation field).

type InvAuthorization {
    ID: ID!
    SearchPurpose: String!
    Necessity: [Necessity]
    Proportionality: [Proportionality]
    TrainedAndAuthorized: Boolean  
    DarkWeb: Boolean!
    AuthBy: Authority!
    Created: Date!
    Investigation: Investigation!
}

4. Additional Fields and Types For Edges

Edges in a Property Graph database may have properties (key-value pairs) associated with them. When defining the DB schema, these properties can be defined as field arguments as demonstrated in the following snippet of a DB schema.

type Blogger {
    Name: String!
    Blogs(certainty:Int! comment:String): [Blog]  @uniqueForTarget @requiredForTarget
}

type Blog {
    Title: String!
    Text: String!
}

By this definition, every edge from a Blogger object to a Blog object has a certainty property and, optionally, it may have a comment property.

Field arguments such as certainty and comment would have a different meaning when used in a schema for the GraphQL API and, thus, they have to be removed from the field definitions when extending the DB schema into the API schema. Hence, after removing the field arguments (and adding the aforementioned ID fields and the fields for traversing edges in the opposite direction), the API schema for the aforementioned DB schema would look as follows.

type Blogger {
    ID: ID!
    Name: String!
    Blogs: [Blog]  @uniqueForTarget @requiredForTarget
}

type Blog {
    ID: ID!
    Title: String!
    Text: String!
    Blogger: Blogger!
}

Although we have to remove the field arguments from the fields that define edge types in the DB schema, we may want to enable GraphQL queries to access the values of the edge properties that these edges of these types have. For instance, we may want to query for the certainty of edges between bloggers and blogs. To this end, the edges have to be represented as objects in the GraphQL API. Hence, it is necessary to generate an object type for each type of edges and integrate these object types into the schema for the API. For instance, for the edges between bloggers and blogs, an object type called BlogsEdgeFromBlogger will be generated and access to objects of this new type will be integrated into the schema by adding a new field to the Blogger type and to the Blog type, respectively.

type Blogger {
    ID: ID!
    Name: String!
    Blogs: [Blog]  @uniqueForTarget @requiredForTarget
    OutgoingBlogsEdges: [BlogsEdgeFromBlogger]
}

type Blog {
    ID: ID!
    Title: String!
    Text: String!
    Blogger: Blogger!
    IncomingBlogsEdgeFromBlogger : BlogsEdgeFromBlogger!
}

type BlogsEdgeFromBlogger {
    ID: ID!
    source: Blogger!
    target: Blog!
    certainty:Int!
    comment:String
}

Given this extension, it is now possible to write GraphQL queries that access properties of the edges along which they are traversing. The following query demonstrates this option.

query {
    Blogger(ID:3991) {
        Name
        OutgoingBlogsEdges {
            certainty
            target {
                Title
                Text
            }
        }
    }
}

5. Mutation Type and Corresponding Input Types

In addition to the aforementioned query type, another special type that a GraphQL API schema may contain is the mutation type. The fields of this type specify how data can be inserted and modified via the GraphQL API. For instance, the following snippet of a GraphQL API schema defines a mutation type.

type Mutation {
    setTitleOfInvestigation(ID:ID! Title:String!): Investigation
}

Given this mutation type, it is possible to modify the title of an Investigation object specified by a given ID; the result of this operation is defined to be an Investigation object (we may assume that this is the modified Investigation, which may then be retrieved as the response for the operation).

Now, when extending the DB schema into the API schema, the plan is to generate a mutation type that contains three operations for every object type in the DB schema and another three operations for every edge type. The three operations for an object type XYZ are called createXYZ, updateXYZ, and deleteXYZ; as their names suggest, these operations can be used to create, to update, and to delete an object of the corresponding type, respectively. In the following, we discuss the mutation operations in more detail.

5.1 Creating an Object

Consider the aforementioned object type Investigation of our DB schema. The create operation for Investigation objects will be defined as follows.

    createInvestigation(data:DataToCreateInvestigation!): Investigation

The value of the argument data is a complex input object that provides the data for the Investigation object that is to be created. This input object must be of the type DataToCreateInvestigation. This input type, which will be generated from the object type Investigation of the DB schema, will be defined as follows.

input DataToCreateInvestigation {
    UserID: ID!
    Title: String!
    Description: String!
    CaseNumber: String!
    Authorization: DataToConnectAuthorizationOfInvestigation
    Searches: [DataToConnectSearchesOfInvestigation]
    Created: Date!
    Hide: Boolean!
}

input DataToConnectAuthorizationOfInvestigation {
    connect: ID
    create: DataToCreateAuthorization
}

input DataToConnectSearchesOfInvestigation {
    connect: ID
    create: DataToCreateManualSearch
}

input DataToConnectScheduleSearchesOfInvestigation {
    connect: ID
    create: DataToCreateSearch
}

Notice that all fields that are mandatory in Investigation are also mandatory in DataToCreateInvestigation (and optional fields remain optional). Moreover, fields whose value type in Investigation is a scalar type (or a list thereof) have the same value type in DataToCreateInvestigation. In contrast, fields that represent outgoing edges in Investigation have a new input type that can be used to create the corresponding outgoing edge(s). This can be done in one of two ways: either by identifying the target node of the edge via the connect field or by creating a new target node via the create field.

5.2 Updating an Object

The update operations for Investigation objects will be defined as follows.

    updateInvestigation(ID:ID! data:DataToUpdateInvestigation!): Investigation

The Investigation object to be updated can be identified by the argument ID. The value of the argument data in this case is another input type that provides values for the fields to be modified. This input type is defined as follows.

input DataToUpdateInvestigation {
    UserID: ID
    Title: String
    Description: String
    CaseNumber: String
    Authorization: DataToConnectAuthorizationOfInvestigation
    ManualSearches: [DataToConnectManualSearchesOfInvestigation]
    ScheduleSearches: [DataToConnectScheduleSearchesOfInvestigation]
    Created: Date
    Hide: Boolean
}

Notice that all fields in this input type are optional to allow users to freely choose the fields of the Investigation object that have to be updated. For instance, to update the UserID and the Title of the Investigation object with the ID 371 we may write the following.

mutation {
    updateInvestigation(
           ID:371
           data: {
              UserID:87
              Title:"Five Orange Pips"
           }
    ) {
        ID
        UserID
        Title
      }

Updates like this override the previous value of the updated fields. In the case of outgoing edges this means that new edges replace all previously existing edges (without removing the target nodes of replaced edges). If you want to add edges instead, use the connect operations described below.

5.3 Deleting an Object

The delete operations for Investigation objects will be defined as follows.

    deleteInvestigation(ID:ID!): Investigation

The argument ID can be used to identify the Investigation object to be deleted.

Notice that deleting an object implicitly deletes all incoming and all outgoing edges of that object.

5.4 Creating an Edge

Consider the types of edges that represent the aforementioned relationship between bloggers and blogs. In the DB schema, this type of edges is defined implicitly by the field definition Blogs in object type Blogger. The create operation for these edges will be defined as follows.

    createBlogsEdgeFromBlogger(data:DataToCreateBlogsEdgeFromBlogger!): BlogsEdgeFromBlogger

The new input type for this operation, DataToCreateBlogsEdgeFromBlogger, will be generated as follows.

input DataToCreateBlogsEdgeFromBlogger {
    sourceID: ID!   # assumed to be the ID of a Blogger object
    targetID: ID!   # assumed to be the ID of a Blog object
    certainty:Int!
    comment:String
}

5.5 Updating an Edge

Update operations for edges will be generated only for the types of edges that have edge properties. The edges that represent the aforementioned relationship between bloggers and blogs are an example of such edges. The update operation that will be generated for these edges is:

    updateBlogsEdgeFromBlogger(ID:ID! data:DataToUpdateBlogsEdgeFromBlogger!): BlogsEdgeFromBlogger

The argument ID can be used to identify the BlogsEdgeFromBlogger object that represents the edge to be updated. The new input type for this operation, DataToUpdateBlogsEdgeFromBlogger, will be generated as follows.

input DataToUpdateBlogsEdgeFromBlogger {
    certainty:Int
    comment:String
}

#### 5.6 Deleting an Edge
The delete operations for the edges between bloggers and blogs will be defined as follows.

deleteBlogsEdgeFromBlogger(ID:ID!): BlogsEdgeFromBlogger



The argument `ID` can be used to identify the `BlogsEdgeFromBlogger` object that represents the edge to be deleted.

LiUGraphQL / woo.sh