edgedb / edgedb

A graph-relational database with declarative schema, built-in migration system, and a next-generation query language
https://edgedb.com
Apache License 2.0
13.03k stars 400 forks source link

GraphQL features and overview of existing technologies #187

Closed vpetrovykh closed 5 years ago

vpetrovykh commented 6 years ago

GraphQL is designed to be an abstraction for APIs, not specifically for DBs. There has been some attempts at adapting GraphQL to query DBs instead of the native query languages.

Prisma

Prisma is one of the latest GraphQL abstraction layers on top of a SQL DB.

Data model in Prisma allow adding information about specific GraphQL fields via directives. There's unique, relation, default, and rename:

From the defined data model Prisma automatically produces a GraphQL schema with all the various querying options. An example of such a schema can be found here: https://gist.github.com/gc-codesnippets/f302c104f2806f9e13f41d909e07d82d

Queries

For a type Foo defined in the data model the following querying options will be created:

The where argument type contains all the complexity of the filtering expressions. It is composable via AND, and OR and it has a whole bunch of fields specialized in other operations for each scalar fields of the base type. Here's an example of where fields generated for a Type that has a name of type String:

    name: String
    name_not: String
    name_in: [String!]
    name_not_in: [String!]
    name_lt: String
    name_lte: String
    name_gt: String
    name_gte: String
    name_contains: String
    name_not_contains: String
    name_starts_with: String
    name_not_starts_with: String
    name_ends_with: String
    name_not_ends_with: String

The where argument also supports nesting based on relations:

query {
  posts(where: {
    author: {
      age_gt: 18
    }
  }) {
    id
    title
    author {
      name
      age
    }
  }
}

In addition to pagination/cursor support various aggregation features are also exposed via Connection fields, e.g.:

# Count all posts with a title containing 'GraphQL'
query {
  postsConnection(where: {
    title_contains: "GraphQL"
  }) {
    aggregate {
      count
    }
  }
}

Ordering works based on one scalar field (expressed as an ENUM, e.g. title_ASC). Ordering by multiple fields or related fields is not currently possible. In principle this could be extended to at least allow ordering based on a list of such ENUMS. It would be a bit harder to generalize the current specification for ordering by related fields, but a slightly different ordering approach using orderBy input Type mirroring query Type can work.

Here are some discussions about ordering:

Mutations

Fundamentally there are 2 kinds of mutations, much like queries:

The simple mutations fall into one of 4 categories:

It is possible to have nested data specified for the above mutations to process nested objects. The fields referring to nested relations have an intermediate wrapper with nested mutation arguments:

Scalar lists have their own mutation specs:

Batch mutations only have updateMany and deleteMany varieties. They do not trigger subscription events.

Subscriptions

Subscriptions use a where argument to specify what changes are being subscribed to. It's possible to subscribe broadly to all objects of a specific Type or specifically to some subset that is expressible via the same type of filter as for queries.

Neo4j GraphQL Extension

https://neo4j.com/developer/graphql/

This extension provides a way to write a GraphQL schema mapping onto neo4j backend. Basically with the aid of a number of special directives the schema can be designed to reflect relations and execute queries or mutations by providing cypher code. This way any special functionality can be parametrized and mapped onto cypher code in the schema.

By default filtering capabilities are exposed by using the field names as parameters. There's also an orderBy, first, and offset parameters to perform ordering and slicing.

Basic mutations for single objects and batches of them are also autogenerated based on the schema. For mutating nested objects additional separate mutations are created, e.g. addPersonMovies (add an existing Movie to an existing Person).

Postgraphile

https://www.graphile.org/postgraphile/ https://github.com/graphile/postgraphile

The GraphQL schema is build by introspecting the PostgreSQL DB schema. The resulting schema reflects all the detected relationships into fields named like personByAuthorId. The expectation is to use GraphQL field aliasing to produce nice results.

{
  allPosts {
    edges {
      node {
        headline
        body
        author: userByAuthorId {
          name
        }
      }
    }
  }
}

Autogenerated GraphQL queries that return multiple results actually return Relay connections. Currently they support only rudimentary filtering using the condition argument.

For example, for a table superheroes the following can be autogenerated:

Computed columns are defined by functions with special names. Matching these functions to types is done based on function and table name. The function parameters are exposed to GraphQL as field parameters.

Other functions are exposed as Query or Mutation fields, also with parameters. STABLE functions are queries and VOLATILE functions are mutations.

What gets exposed is controlled via comments in PostgreSQL. Including various "tags" like @omit for ignoring things or @name newName for renaming affects what gets reflected.

Join Monster

https://github.com/stems/join-monster

The schema is defined in JS using join-monster and graphql-js. Basically, the GraphQLObjectType is augmented with additional info pertaining to SQL tables and special join-monster oriented resolver specs.

Mapping of args onto SQL is done with helpers and SQL snippets, explicitly.

Join Monster supports different pagination implementations based on Relay connection spec.

The goal is to have SQL resolvers coexist with other resolvers in the application.

Conclusion

Pretty much everyone implements Relay connections in some form. It seems that extending these connections is a good way to implement aggregate functions generically. This could be done by including 2 flavors of links foo and foo_connections for to-many links. This way it would be possible to use the extra features (pagination and aggregation) of connections, but also to have simpler structured plain vanilla nested objects when necessary.

It seems that there's a strong tendency to differentiate single-root- object operations and multiple-root-object operations. This mainly affects the top-level Query/Mutation fields and is a simple enough to implement these versions as Foo and AllFoo (also maybe AllFooConnections from the previous point).

Filtering as a where is a nice generalization. It is much more flexible than per-field filtering. We could simply use the clause names as arguments for filtering and ordering filter, orderBy. There's not much agreement as to how filtering and ordering is done, so it's not going to break much of anything. On the other hand slicing should adhere to Relay connections spec having before, after, first, and last. We can offer this mechanism regardless of whether the particular field returns a list or relay connections (omitting it for to-one links).

There's not much agreement on how mutations should be structured. The more generalized approach requires to have some disambiguation as to how to treat nested structures. Basically we need to map EdgeQL's :, := (and later += -=) to an intermediate InputObject with fields like 'create', 'assign', 'add', 'remove'. Alternatively, this disambiguation can be folded into the InputObject as an extra [optional] field like __operation__. This way we can omit the __operation__ in the same cases as we just use : in EdgeQL and also the nesting will not be as deep.

Depending on how we will approach the differences between sets and arrays being mapped onto List in GraphQL, different option for mutations might be preferable.

Exposing functions as top-level Query/Mutation fields is pretty straight-forward.

We could expose a special field __query__ on every Type that returns a Query (maybe omitting special __type and __schema fields). The point then is that it could be used as a kind of subquery/computable in a generic fashion:

query myQuery(name: 'Alice'){
    User(filter: {name: $name}) {
        name
        age
        # attach a subquery
        related: __query__ {
            Stuff(filter: {
                something: {
                    owner: $name
                }
            }) {
                something {
                    foo
                    bar
                }
                other
            }
        }
    }
}

Once we have working triggers, subscriptions can be implemented. There's no specific agreement on the format here.

vpetrovykh commented 6 years ago

After some deliberations here's what the plan is w.r.t. filtering, ordering and pagination/slicing capabilities in GraphQL.

Filtering

We're discontinuing the simplistic approach of using arguments with the same names as properties for filtering. It is not general and flexible enough. Instead we'll adopt a single filter argument that will take a special input object.

Here's an example of what the input objects would look like:

# this is User-specific
input FilterUser {
    # basic boolean operators that combine conditions
    and: [FilterUser!]
    or: [FilterUser!]
    not: FilterUser

    # fields available for filtering (properties in EdgeQL)
    name: FilterString
    age: FilterInt
}

# these are generic
input FilterString {
    # equality
    eq: String
    neq: String

    # lexicographical comparison
    gt: String
    gte: String
    lt: String
    lte: String

    # other useful operations
    contains: String
    startswith: String
    endswith: String
    # potentially other things like case-insensitive matches, etc.
}

input FilterInt {
    # equality
    eq: Int
    neq: Int

    # comparison
    gt: Int
    gte: Int
    lt: Int
    lte: Int
}

It is legal to provide multiple input fields in the same input object. They are all implicitly combined using a logical conjunction. For example:

query FilterByNameAndAge {
    User (filter: {
        # all users older than 10 and who are not "Alice"
        age: {
            gt: 10
        }
        name: {
            neq: "Alice"
        }
    }) {
        name
        age
    }
}

query FilterByAgeRange {
    User (filter: {
        # all users with 10 <= age < 20
        age: {
            gte: 10,
            lt: 20
        }
    }) {
        name
        age
    }
}

Ordering

Ordering is defined in terms of 3 things:

  1. fields used in ordering
  2. direction
  3. ordering of null values

We propose to use orderBy argument that will take a special input object:

# a relevant Query definition snippet
type Query {
    User(
        filter: FilterUser,
        orderBy: [OrderUser!],
        # ... other arguments
    ): [User!]
}

input OrderUser {
    name: Ordering 
    age: Ordering 
}

input Ordering {
    direction: directionEnum
    nulls: nullsOrderingEnum
    # we can also assume some specific default when `nulls` is missing
}

enum directionEnum {
    ASC
    DESC
}

enum nullsOrderingEnum {
    FIRST
    LAST
    SMALLEST    # null < any other value
    BIGGEST     # null > any other value
}

We can also use the lexical order in which the input fields are specified to determine the specific ordering (first by 'name', then by 'age' or vice versa).

Here's some usage examples:

query SimpleOrdering {
    User(orderBy: {name: {direction: ASC}}) {
        name
        age
    }
}

# Order users by 'age' in descending order and then by 'name'
# alphabetically, putting the users without 'age' last.
query MultipleOrdering1 {
    User(orderBy: {
        age: {direction: DESC, nulls: LAST},
        name: {direction: ASC}
    }) {
        name
        age
    }
}

# Order users by name alphabetically and then by age in descending order,  
# putting the users without 'age' last.
query MultipleOrdering2 {
    User(orderBy: {
        name: {direction: ASC},
        age: {direction: DESC, nulls: LAST}
    }) {
        name
        age
    }
}

Pagination/Slicing

Keeping in line with the popular approach pagination is best implemented by mimicking Relay Connections. Even though we don't return edges the basic pagination interface can be re-used for plain list slices. In the future we can also re-use the same arguments for fields that return edges rather than lists.

We define 4 arguments: after, before, first, last.

# a relevant Query definition snippet
type Query {
    User(
        filter: FilterUser,
        orderBy: [OrderUser!],

        after: String,
        before: String,
        first: Int,
        last: Int,
    ): [User!]

    # ... other Query fields
}

The after and before strings are, in fact, string representations of numeric indices under the particular filter and ordering (starting at "0"). This makes the usage fairly intuitive even without having Relay Connection edges and cursors, but extensible to that in the future.

E.g.

query FirstPage {
    User(first: 10) {
        name
        age
    }
}

query ThirdPage {
    User(after: "19", first: 10) {
        name
        age
    }
}

query FancySlice {
    # this is akin to Python's `users[10:30]`
    User(after: "9", before: "30") {
        name
        age
    }
}
1st1 commented 6 years ago

Let's use dir instead of direction for ordering.

vpetrovykh commented 6 years ago

OK.

Any comments on the SMALLEST, BIGGEST for nulls ordering?

1st1 commented 6 years ago

My immediate reaction is that we don't need them in GraphQL, at least initially. They are needed when you have a complex multi-field ordering, and given the fact that most GraphQL backends don't support that there's no evidence they will be useful.

vpetrovykh commented 6 years ago

Another aspect of our GraphQL schema is how we reflect EdgeDB types into GraphQL types, interfaces and query fields.

Basically for a given object type Foo in EdgeDB we create the following in GraphQL:

Here's an example:

# eschema
abstract type NamedObject:
    required property name -> str

type User extending NamedObject

Gets converted into:

interface Object {
    id: ID!
}

interface NamedObject {
    id: ID!
    name: String!
}

interface User {
    id: ID!
    name: String!
}

type UserType implements User, NamedObject, Object {
    id: ID!
    name: String!
}

type Query {
    Object(...): [Object!]  # this will include Users
    NamedObject(...): [NamedObject!]  # this will include Users
    User(...): [User!]
}
vpetrovykh commented 6 years ago

My thoughts is that SMALLEST or BIGGEST are more intuitive defaults for null ordering. This is because when you toggle ascending/descending in a table GUI you'd expect null results to swap from beginning to the end rather than always be LAST or FIRST.

1st1 commented 6 years ago

My thoughts is that SMALLEST or BIGGEST are more intuitive defaults for null ordering. This is because when you toggle ascending/descending in a table GUI you'd expect null results to swap from beginning to the end rather than always be LAST or FIRST.

Yeah, we can keep SMALLEST and BIGGEST and consider adding FIRST and LAST later.

ansarizafar commented 5 years ago

Another relevant project https://github.com/genie-team/graphql-genie to learn from, especially role-based access control https://github.com/genie-team/graphql-genie/tree/master/plugins/authentication

ansarizafar commented 5 years ago

Another relevant project https://hasura.io/.

Are we on track for a beta release with nodejs bindings as mentioned by @1st1 https://github.com/edgedb/edgedb/issues/120#issuecomment-394697066 I would love to use amazing and revolutionary EdgeDB with nodejs bindings for a new project early next year.

1st1 commented 5 years ago

The GraphQL support is now relatively refined and stable. I'm closing this one; we'll open new issues if necessary.