Mec-iS / mild-QL

An experiment for a self-deployable REST/HYDRA server from Swagger documentation
http://mild-ql.appspot.com/
0 stars 0 forks source link

Where does Hydra fit in? #3

Open ariutta opened 9 years ago

ariutta commented 9 years ago

From the gitter chat:

ariutta: 1) HYDRA and standard REST Request flow:

GraphQL client ->
GraphQL server/middleware ->
multiple HYDRA server endpoints ->
Relational datastore?

vs.

2) GraphQL (could alternatively use Falcor in a similar way) Request flow:

GraphQL client ->
GraphQL server/middleware ->
Triplestore or Neo4J

Mec-iS: The number 1 gives for sure more possibilities for horizontal integration with other technologies/standards, and gives also the possibility of serving a SPARQL endpoint beside the Hydra, to at least try some backward compatibility with other Linked Data. Number 2 relegates the project to a JS only stack, that is not the best choice to be "cross-platform" among technologies/standards.

What characteristics the "multiple HYDRA endpoint server" should have? Beside parsing JSON, referencing, dereferencing, read/write the DB, etc?

ariutta commented 9 years ago

The number 1 gives for sure more possibilities for horizontal integration with other technologies/standards, and gives also the possibility of serving a SPARQL endpoint beside the Hydra, to at least try some backward compatibility with other Linked Data. Number 2 relegates the project to a JS only stack, that is not the best choice to be "cross-platform" among technologies/standards.

Actually, with 2), we could serve a SPARQL endpoint, a GraphQL endpoint and a Falcor endpoint. And I know people are working on ports of GraphQL and Falcor for other languages. But it's true: GraphQL and Falcor are not W3C standards, and they don't have as much support/documentation as there is for something like SPARQL.

ariutta commented 9 years ago

Also from the gitter chat:

Mec-iS: A practical question: I have two endpoints to fetch data from: a map of keywords and articles http://hypermedia.projectchronos.eu/visualize/articles/?api=true a list of keywords http://taxonomy.projectchronos.eu/concepts/c/ (see only the entry with "group": "keywords") Suppose I want to establish a graphQL server to base a data-visualization front-end in BackBoneJS to visualize these endpoints. What are the advantages? The frontend would deal only with graphQL? graphQL can be an interface to multiple REST endpoints?

ariutta commented 9 years ago

Suppose I want to establish a graphQL server to base a data-visualization front-end in BackBoneJS to visualize these endpoints. What are the advantages? The frontend would deal only with graphQL? graphQL can be an interface to multiple REST endpoints?

Backbone and React can work together: https://github.com/larsonjj/react-backbone-example

But I think Relay+GraphQL duplicates some of the functionality of Backbone, so I'm not whether Relay+GraphQL+React+Backbone would be a good combination.

The key idea of GraphQL and Falcor is this: let's say you wanted to show the label for each concept on the outside of a circle with linkages between labels when they are both found as keywords for an article, which could look something like this: http://image.slidesharecdn.com/chernyshortnetworksvis-150116085407-conversion-gate01/95/visualizing-networks-45-638.jpg?cb=1421420199

In that case, you'll just want to get the label for every concept as well as the id and keywords for every article. You don't want to pull down the additional article properties, such as title, abstract, stored and published. Especially if you're on a mobile connection, you want to get the smallest amount of data required to render the view. With GraphQL or Falcor, you can specify that you just want this limited set of data and get it in one query. How would we create the query using REST?

ariutta commented 9 years ago

The frontend could be responsible for creating the GraphQL query, and the GraphQL server/middleware could be responsible for either calling the appropriate REST endpoint(s) or else just directly making a query to a graph database/triplestore.

Mec-iS commented 9 years ago

Ok, now I have a quite clearer view of usage scenarios.

Especially if you're on a mobile connection, you want to get the smallest amount of data required to render the view. With GraphQL or Falcor, you can specify that you just want this limited set of data and get it in one query. How would we create the query using REST?

We would probably need to separate the totality of the properties in the response in different views, one with 'full-outcome', another 'lite', another 'd3-prone'. Or just make the the server to respond with the 'lite' version when a mobile-client hits it, using Content Negotiation. This is a very interesting thing to me, as I am thinking about serving special endpoints for data-viz view.

Back to the initial question: every class that is in the database can be mapped 1-to-1 to an API that implements Hydra framework, and this thing can be really useful both to publish for machine-readability and/or as a 'lower layer API' by the upper layer of our stack.

ariutta commented 9 years ago

We would probably need to separate the totality of the properties in the response in different views, one with 'full-outcome', another 'lite', another 'd3-prone'.

I did something like this using a separate JSON Schema for each different view. I used an Express.js server (Node.js), and before sending back the JSON response, I would pass it through this JSON Schema filter. Three questions on this:

1) the number of schemas can become very large (although I didn't try nested schemas) 2) how does a client specify which schema to use in a RESTful way? 3) Do we want to use JSON Schema?

Also, many views required multiple requests per view, which can be slow.

The beauty of GraphQL and Falcor is that the client tells the server what the client needs, and the server gives back all the data required and nothing more, all in one request.

So if we want to use HYDRA/REST, we need to figure out how our proxy server will parse and respond to GraphQL requests. Using the example from above where we want to get the label for every concept as well as the id and keywords for every article, our GraphQL client might make these two requests (not sure about the array syntax for property keywords):

GET http://taxonomy.projectchronos.eu/ with request body:

query concept(startsWith: "c", group: "keywords") {
  label
}

GET http://hypermedia.projectchronos.eu/ with request body:

{
  article {
    id,
    keywords
  }
}

It might even be possible to just make a single request (maybe by using an array syntax and/or by using GraphQL fragment?), but I don't know it well enough to be sure.

Now our proxy server needs to parse the GraphQL, make the appropriate calls to the REST API and return the filtered response. This could look like:

GET http://taxonomy.projectchronos.eu/concepts/c/?by=group&group=keywords

That will return an array of objects like this one:

{
  "label": "air pollution",
  "url": "http://taxonomy.projectchronos.eu/concepts/c/air+pollution#concept",
  "group": "keywords"
}

Then we need to filter to just return an array of objects like this one:

{
  "label": "air pollution"
}

How do we do this filtering?

Likewise, we need to get the articles: GET http://hypermedia.projectchronos.eu/visualize/articles Then filter the response to only include the id and keywords.

I'm not sure how the proxy server knows how to do the steps of 1) making the appropriate REST API call(s) 2) filtering the response

Mec-iS commented 9 years ago

If you need, you can find the source of the 'taxonomy' server here: https://github.com/SpaceAppsXploration/pramantha-nodejs-backend

1) the number of schemas can become very large (although I didn't try nested schemas)

I don't have knowledge about filtering through JSON, but it doesn't sound the right way using them on their own

2) how does a client specify which schema to use in a RESTful way?

Using Concent Negotiation; in general, you specify in the header of the request some content definition properties you want in the response. The server read the property and set the content of the response.

3) Do we want to use JSON Schema?

I leave all the decision about anything-JS to you, I can support with some server-side reasoning. When you are doubtful, you can try to write down a pros/cons analysis and we try to gather information together. For this case, see my note about Swagger JSON in #4. Is there some way we can use JSON-schema, JSON-filtering and JSON-swagger in the same pipe (at proxy layer)? See below.

Define some starting concepts

The beauty of GraphQL and Falcor is that the client tells the server what the client needs, and the server gives back all the data required and nothing more, all in one request.

The Stack

Wait, what you mean for 'client' and 'server'? I propose these names to avoid confusion, we have:

  1. a python server that queries directly the database (back-end layer)
  2. a JS proxy/middleware that works on top of 1. (proxy layer)
  3. a JS client-app (client)

From your example, I understood that: A. 3 makes a request to 2 with a QL query in the request body B. 2 receive and parse the QL query into different HTTP requests to 1 C. 1 serves the data requested Can you explain better the stack using this patter? (Different designs of the proxy can lead to different distribution of task between 1 and 2).

Procedures

I'm not sure how the proxy server knows how to do the steps of 1) making the appropriate REST API call(s)

In case of HYDRA this would be done by a smart-client, that reads the vocabulary and knows at which endpoint to get the information. We can perform something similar using Swagger again, it is a quite powerful framework for handling REST testing, documenting, describing. For sure we can leverage some features (like the swagger editor)

2) filtering the response

As above, once we have a REST API description (in JSON or YAML or HYDRA vocabulary), we can use it to find and filter what you need, with our custom 'filtering engine' in the proxy or in the back-end.

The 200-OK API Design

The thing I am wondering instead: What is the best design for a REST API to facilitate these operations on the upper layers?

Probably the APIs in the example are not perfectly set to respond those needs, but we will find the right way. Usually it works: start with the lowest layer (the one closer to the DB), mapping the resources in the DB in the simplest way (1-to-1): GET http://url.com/resource/c the (paginated) collection of resource

GET http://url.com/resource/c/?id=123 the resource in the collection with id 123

I set up some endpoints in a mock server. Then we go with an API description (YAML or JSON) and then we see how to go on.

ariutta commented 9 years ago

3) Do we want to use JSON Schema? I leave all the decision about anything-JS to you,

Actually, I was thinking about JSON Schema on the proxy layer. JSON Schema is not JS-specific, so it could be used with Python or any other language.

I can support with some server-side reasoning. When you are doubtful, you can try to write down a pros/cons analysis and we try to gather information together. For this case, see my note about Swagger JSON in #4. Is there some way we can use JSON-schema, JSON-filtering and JSON-swagger in the same pipe (at proxy layer)? See below.

A combination of JSON-schema, JSON-filtering and JSON-swagger could be interesting!

The Stack

Sounds good, except the proxy layer would also be in Python. By "server" I was referring to what the browser communicates with, which in this case is the proxy layer. Using that pattern: A. 3 makes a request to 2 with a QL query in the request body B. 2 receive and parse the QL query into different HTTP requests to 1 C. 2 filters and/or combines the response from 1 D. 2 serves the data requested

What is the best design for a REST API to facilitate these operations on the upper layers?

I need to clarify what the objective is. These are the possibilities I'm hearing:

  1. Create a new API server (both a new proxy layer and a new back-end layer) that supports all of Hydra/REST, SPARQL and GraphQL
  2. Create a GraphQL proxy layer that works with existing back-end layers A. just one, or multiple back-end layers? B. what kind of back-end layers would it support?

    i. all API(s)
    ii. RESTful API(s) only
    iii. RESTful Hydra API(s) only
  3. Create a new GraphQL server (back-end layer only; no proxy layer needed)

If the answer is 1), what is the benefit of supporting Hydra/REST? If we don't support Hydra/REST. it would be much easier to implement 3) by creating a GraphQL -> SPARQL translator and using an existing SPARQL endpoint server. PRO: between REST/Hydra, GraphQL and SPARQL, the most expressive is SPARQL, so it makes sense for it to be the bottom-most layer. CON: SPARQL endpoints aren't known being fast.

If we really need to support HYDRA/REST, it might be easier to make the back-end layer be a SPARQL endpoint and then build two proxy layers: one for GraphQL queries and another for HYDRA/REST queries. The HYDRA/REST proxy layer could possibly be generated by building a mapper from SPARQL to HYDRA and/or Swagger.

If the answer is 2), we should use one or more existing (in production) API(s) without modification.

Mec-iS commented 9 years ago

If the answer is 2), we should use one or more existing (in production) API(s) without modification.

Again, the simplest to start. So I think it's 2, and then, once we have some prototype, we can think how to enhance it vertically.

If we really need to support HYDRA/REST, it might be easier to make the back-end layer be a SPARQL endpoint and then build two proxy layers: one for GraphQL queries and another for HYDRA/REST queries. The HYDRA/REST proxy layer could possibly be generated by building a mapper from SPARQL to HYDRA and/or Swagger.

Hydra is not mandatory in my opinion, mostly because it's not a standard yet. It can be implemented later; we should keep the REST design straightforward enough to not make Hydra implementation impossible in a later time. Using Hydra natively would mean to develop a reasoner on the RDF vocabulary (see Hydra bundle on which the smar-client works), but it would be too time/resource-spending, better delegate all the work on RDF to a SPARQL endpoint (that is a well established standard).

The scenario

About what you suggested, mild-QL ver 0.5 is defined by:

A. database layer  
B. API layer 
C. Python-QL proxy layer

In general, we can define it as: a query engine or broker for graphQL that is based on JSON-Schema+JSON-Swagger, on a Python server

Solution I (most complete but very complex)

Multiple back-ends or OrientDB (graph plus document db) + SPARQL and REST + a proxy

A. triples store and relational
B. SPARQL (see also GREMLIN as alternative) and REST (HUG server) endpoints in parallel (both with access to the db, so the proxy/server can decide what to use depending on query's complexity)
C. Python-QL proxy

Pros:

Cons:

Solution II (good balance to start)

One back-end + a proxy

A. relational DB (PostGRE) 
B. REST (HUG) only
C. Python-QL proxy for REST. 

Pro:

Cons:

The Big Picture: macro-procedure

mild-QL ver 0.5 as defined above in solutions I or II would deploy these tasks (from input to output):

  1. C taking in a query
  2. C matching the query to given schemas and API descriptions
  3. C defines a procedure to fetch the needed resource (algorithm to decide the 'fastest and fittest path' to answer the query)
  4. C fetches the needed resources (REST only in the beginning)
  5. A and B serve the data
  6. C filters and aggregates properties from different responses
  7. C serves the output as required in the graphQL query

Note: heavy caching (Redis?Memcache?) needed for 3,4,5,6

Result

We should probably focus on defining:

From now on I suggest to call:

In my opinion, we should keep the three layers as independent as possible, communicating among them through standard query languages or over HTTP. So the solution of a one/monolithic back-end is not good. If we make it, probably in future the most interesting layer would be the proxy and it will be very useful to make it pluggable to different back-ends. Speaking easy: mild-QL could be a middleware to plug onto any REST-server to make it a graph-QL server, or something acting like it.