jasonrhodes commented 5 years ago

We've discussed whether having GraphQL as part of the infrastructure and logs UI data layer is a good thing or whether we should remove it in favor of a more simple REST (or other?) interface. We should outline the basic pros and cons. I think it would be good to have the folks who have been working with this code (@simianhacker, @weltenwort, @skh, and @Kerry350) document the following answers:

"Keep GraphQL" side:

Where and how do we currently use GraphQL in the infra and logs UI?
What advantages do we get out of using GraphQL?
What might we lose by switching to REST?

"Remove GraphQL" side:

What are the downsides of continuing to use and support GraphQL in the infra and logs UI?
What are the best options to replace GraphQL if we remove it?
How complicated/risky will it be to remove?

Time-box: 2-3 hours

elasticmachine commented 5 years ago

Pinging @elastic/infrastructure-ui

weltenwort commented 5 years ago

Thanks for remembering to bring this up.

Current Usage

We're using GraphQL for all http API endpoints in the infra plugin except the rather new metrics explorer. The usage looks like this:

There is a single api at ${BASE_PATH}/api/infra/graphql
The server-side Query and Mutation schemata of the API are composed from schema.gql.ts files, which are co-located with the respective resolver code.
The client-side query schemata are derived from *.gql_query.ts, which are co-located with the react containers that use them in their apollo client calls or Query/Mutation usages.
The common, server-side and client-side TypeScript types are generated by the x-pack/plugins/infra/scripts/generate_types_from_graphql.js script, which uses https://github.com/dotansimha/graphql-code-generator to turn the aforementioned schema files into TypeScript interfaces.

(Warning, opinions and speculation from here on)

Advantages

End-to-end type safety is provided by using the graphql schema as the single source of truth in combination with the type generation script
Request and response payload validation is implicitly performed by the apollo client and server.

Disadvantages

The more advanced features of apollo-client (like caching and local state) don't work very reliably, which is why we chose not to use them.
The type generation script requires some additional effort to maintain.
Resolvers are harder to type and read than normal HTTP request handlers.

Alternative

Considering the current usage outlined above, I think we could preserve the advantages and reduce disadvantages by using basic HTTP endpoints with JSON serialization in combination with https://github.com/gcanti/io-ts (already in Kibana's dependency list) for validation and type safety. I don't think the ReST paradigm matches our API very well (and I've rarely seen it implemented correctly). Using Joi for validation also is not ideal IMHO, because it (1) doesn't work in the browser and (2) only validates the request, but not the response and doesn't integrate with TypeScript. Instead, if we replaced GraphQL with HTTP routes, I would propose something like the following:

(common) For every route create 1-3 io-ts runtime types and their derived TypeScript types in a common location for every route:
- optionally one type for the request url parameters
- optionally one type for the request body
- one type for the response body
(server-side) Register the HTTP routes with the Kibana HTTP server as usual.
(server-side) At the beginning of every request handler, decode the request url params and body using the corresponding runtime types. This makes them available as correctly typed values in downstream server-side code. A helper function might be useful to encapsulate this.
(server-side) At the end of every request handler use the runtime type to validate response body.
(client-side) Perform the query using Kibana's kfetch.
(client-side) Before sending the requests, validate the request url params and body using the corresponding runtime types. Again, a helper function or wrapper of kfetch might be useful (possibly a hook that extracts kfetch from the context and uses the supplied runtime type defs?).
(client-side) After receiving a reponse, decode the response using the runtime types. This could also be done using the helper just mentioned.

This would preserve the following characteristics on both client and server

Single source of truth for API types
Full validation of the request params and payloads
Correctly typed values

while improving on problems we currently have with GraphQL

No additional type DSL and type generation script
More commonly used tooling (hapi HTTP/kfetch)
Limited complexity compared to GraphQL

Thanks for reading this wall of text. I'm looking forward to hearing opinions and alternative proposals.

Kerry350 commented 5 years ago

"Keep GraphQL" side:

Where and how do we currently use GraphQL in the infra and logs UI?

@weltenwort's answer to this was very exhaustive. All I'll add is there's now one additional non-GraphQL usage in the "IP to host" endpoint (this was only merged yesterday).

What advantages do we get out of using GraphQL?
- Type safety and the ability to generate types from the schema
- A "source of truth" with the schema
- Parity with the wider community. By this I mean the trend was, for a while, to move to GraphQL from things like REST. This means more libraries, articles, bug fixes etc become relevant to us. However, I've actually seen this trend reversing of recent.
- In an ideal world we'd have the ability to request "what we need" (GraphQL's biggest selling point), but we don't use this functionality.
- There is a theoretical advantage that GraphQL can provide better analytics.
What might we lose by switching to REST?
- We will lose the overarching schema, which is useful as a source of truth and for generating types. We will need to fill this gap ourself with io-ts, Joi et al.
- We'd lose the ability to "play" with / test the API with GraphiQL.

"Remove GraphQL" side:

What are the downsides of continuing to use and support GraphQL in the infra and logs UI?
- We're not using the features that make GraphQL powerful - things like requesting a subset of fields, requesting multiple resources, dealing with polymorphic data etc
- Everything is a POST, HTTP caching is lost (say we want to start caching things like the auto suggestion results)
- We have the overhead of Apollo, but don't use two of it's most powerful features: caching and subscriptions (for realtime data)
- Cognitive overhead: there is more to reason about in implementing something in GraphQL (schema, resolvers etc) than a simple endpoint function
- Testing is a bit more involved (than using something like Supertest)
What are the best options to replace GraphQL if we remove it?

I actually really like the method proposed by @weltenwort (for context we spoke a little about it at GAH too, and Felix explained the issues with Joi in the browser).

I don't think the ReST paradigm matches our API very well (and I've rarely seen it implemented correctly).

I do strongly agree with this. Now, I fall in to the "It's RESTful" camp when it's really not just as much as anyone else. But, if we're completely redefining how we do things here, we might as well be accurate 😄 I honestly can't see that for every resource we provide we'll do things like provide hypermedia links to transition that resource from one state to another (pagination), or to navigate to linked resources.

How complicated/risky will it be to remove?

If we go endpoint by endpoint I don't think there's too much risk, at least no more than the Redux -> Hooks refactors that have been taking place.

weltenwort commented 5 years ago

We'd lose the ability to "play" with / test the API with GraphiQL.

Good point! This really helped me during development and testing and it might be difficult to replicate that without GraphQL. There are similar tools for HTTP APIs, but they also require some kind of machine-readable API specification or documentation to provide a comparable level of convenience.

If we go endpoint by endpoint I don't think there's too much risk, at least no more than the Redux -> Hooks refactors that have been taking place.

Agreed. The complexity of the migration might be a bit higher if we want to slim down the abstraction layers at the same time. But nothing that can't be handled with a bit of API testing discipline.

simianhacker commented 5 years ago

We'd lose the ability to "play" with / test the API with GraphiQL.

Good riddance! The GraphiQL enables us to be lazy. We shouldn't be playing with the API for testing/development, we should be writing API integration tests to trigger the endpoints for development.

I don't think the ReST paradigm matches our API very well (and I've rarely seen it implemented correctly).

Let's agree to stop calling it REST and just call it an HTTP API. Getting bogged down in the details of REST will end up paralyzing us into an endless analysis of how REST like we are, total waste of time.

Cognitive overhead: there is more to reason about in implementing something in GraphQL (schema, resolvers etc) than a simple endpoint function

This is the primary reason we are making this change. Since we are barely using the "killer" features of GraphQL it's pretty hard to justify asking new team members to learn yet another library (along with all the idiosyncrasies of Kibana). Everyone should know how HTTP API's work without much explanation.

Another related issue, the types that GraphQL script generates are practically unreadable and increases the complexity.

import { SourceQuery } from '../../../graphql/types';
export const useMetricsExplorerState = (
  source: SourceQuery.Query['source']['configuration'], // <-- Yuk!
  derivedIndexPattern: StaticIndexPattern
) => {
  ...
}

Furthermore, the type generation is not "automatic", we have to remember to re-generate the types when the schema changes, this can be frustrating to debug when things get out of sync or the output from the underlying library changes. Yet another moving piece we have to manage and maintain.

skh commented 5 years ago

Pro GraphQL

I definitely do access the API directly during development and for manual testing. GraphiQL is an extremely nice tool for this, but I can do what I need with curl or httpie from the command line. This won't have the nice autocompletion and API documentation, but I haven't really relied on that.
I like that GraphQL, through its rigid approach of doing things, forced us to write our endpoints mostly the same way, type and document them. We will probably not keep up the same level of consistency without it.

Contra GraphQL

I agree with everything that's been said above about mental overhead, and us not taking advantage of the real strengths of GraphQL anyway.

Replacement

I like the approach @weltenwort outlines above, but would like to suggest we start with a rather basic (simple, stupid) implementation and only add abstractions / helpers when we're sure they simplify things.
We're mostly only reading data, so I wasn't worried about us getting lost in REST discussions, but by all means, let's call it just an HTTP API and not try to follow a standard where there is not a single one anyway.

jasonrhodes commented 5 years ago

This is fantastic feedback, @infra-logs-ui team! I like everything that's laid out here, and in the end agree with the approach spelled out by @skh that we should try to move to a very simple HTTP API approach and then improve it as we go, although if there are parts of the implementation that @weltenwort outlined above that we think are simple enough to do, we can consider them. I confess I don't completely follow it so we should zoom about it when everyone is back from holiday in a week and begin to make a more structured plan for this work. I'll put something on the calendar.

Thanks!

weltenwort commented 5 years ago

An example of what it would look like with client-to-server typing: weltenwort/kibana#2

roncohen commented 5 years ago

what's the the runtime overhead of io-ts validation? I understood it would run through every JSON response to check that everything is the right type. For large responses (1000+ numbers and dates, for example), I'd assume this could add significant overhead if we're not careful.

weltenwort commented 5 years ago

I'm not aware of any benchmarks, but we could perform some. But since not performing validation is not really an alternative, we should compare against other validation libraries.

roncohen commented 5 years ago

But since not performing validation is not really an alternative, we should compare against other validation libraries.

Is it common to do run time type checking of APIs in production? As an aside, React propTypes are only validated in development. If there's a significant overhead from io-ts, could we use it in development mode only?

weltenwort commented 5 years ago

I consider validating API payloads a security best practice in production. Not blindly trusting user input (which anything from the network essentially is) is a basic principle IMHO. The apollo server and client did that for us so far, but when we move to a custom HTTP API we will have to manually include that validation step. I proposed io-ts for that because it not only performs the validation but also ensures the compiler is aware of the value types, thereby providing a single source of truth (like the GraphQL schema was before).

The analogy to React prop types is not valid, I think. React component props don't really represent a boundary of the system to an untrusted environment in the same way that a network API does.

roncohen commented 5 years ago

to be clear, i think it's great if we can get type checked API responses, but we need to be aware of the overhead

jfsiii commented 5 years ago

An example of what it would look like with client-to-server typing: weltenwort#2

@weltenwort I just saw this and it reminds me of the explorations I did in https://github.com/elastic/kibana/pull/40041 around using JSON Schema to describe the types. As the io-ts author outlines here, depending on one's preferences for which format should be the "source of truth", it's possible to either

convert a runtime type to JSON Schema (either at runtime or compile time)
convert a JSON Schema to a runtime type (either at runtime or compile time)

I'd love to talk more about this when you get a moment. I think this could unlock some very useful features.

weltenwort commented 5 years ago

Absolutely, I'd be happy to discuss it when you have the time.

Thanks for linking the io-ts issue. From the experience we gathered with GraphQL I would prefer the variant where we define the io-ts type at runtime as the source of trutz. The separate compilation step is seen as one of the biggest pains in our current graphql setup, so a pure runtime setup would be preferable.

jfsiii commented 5 years ago

The separate compilation step is seen as one of the biggest pains in our current graphql setup, so a pure runtime setup would be preferable.

Totally agree about avoiding the compilation step. I was hoping for something like Enjoi which I used in that PR to convert from JSON Schema to Joi at runtime without a compilation step

https://github.com/elastic/kibana/blob/5301b2c66f066a5dbc6b7a445951702970e995c7/x-pack/legacy/plugins/integrations_manager/server/routes.ts#L18

https://github.com/elastic/kibana/blob/5301b2c66f066a5dbc6b7a445951702970e995c7/x-pack/legacy/plugins/integrations_manager/server/routes.ts#L34-L43

But I see that both io-ts-codegen and json-schema-to-typescript require writing out to disk. Which make sense I guess.

I was hoping to have a spec expressed in JSON Schema in implemented in other places/languages (Principle of Least Power, etc), but the problems caused by having them out of sync (loss of trust, false positives & negatives) and pain of keeping them in sync seem to shift to authoring in TS and exporting to JSON Schema as needed.

One item from that issue about starting with a runtime type:

However conversions might become tricky if there are recursive runtime types involved.

Can you think of any places where we have recursive runtime types?

weltenwort commented 5 years ago

Can you think of any places where we have recursive runtime types?

Not really. And we probably want to use a safe JSON serializer (with stable key order and cycle detection) in any case when doing this kind of conversion. Do you intend to serve the JSON schema on a well-know route or do you want to write it out to a file?

jfsiii commented 5 years ago

I don't even know if we'll need it now. My initial idea was have a spec in JSON Schema so it could be a) converted to Joi for validation, TS for types, etc and b) included in an OpenAPI definition

We've gone over how we'll try to address (a) in the last few comments and I just saw that there are a few references, especially https://github.com/elastic/kibana/issues/40623, for incorporating OpenAPI definitions.

I think I'll start with having Integrations Manager use io-ts and see where that leads.

weltenwort commented 5 years ago

Yes, that makes the most sense to me as well. We should be able to generate everything both at runtime and any other point in time from the runtime types.

elasticmachine commented 5 years ago

Pinging @elastic/infra-logs-ui

elastic / kibana

[Infra/Logs UI] [R&D] Should we replace GraphQL? #36674

"Keep GraphQL" side:

"Remove GraphQL" side:

Pro GraphQL

Contra GraphQL

Replacement