cayleygraph / cayley

An open-source graph database
https://cayley.io
Apache License 2.0
14.84k stars 1.25k forks source link

shape: Query serialization #668

Open dennwc opened 6 years ago

dennwc commented 6 years ago

Make Shapes serializable. This will allow to pass them over wires as continuation tokens, distribute queries and even to implement virtual predicates.

dennwc commented 5 years ago

More context: #824

iddan commented 5 years ago

How do you like CBOR as serialisation format? It has support in JavaScript, Go, Python and it's fully compliant with JSON.

iddan commented 5 years ago

Another option is to use Gremlin's Bytecode format. Here is the implementation of the Gremlin Bytecode in JavaScript: https://github.com/apache/tinkerpop/blob/75b190665c0689e95847b0f9def145da172e1f9d/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/process/graph-traversal.js Basically: it accumulates the steps to a structure and then uses GraphSON to submit it. GraphSON is very similar to JSON-LD so we can do something simular.

iddan commented 5 years ago

Workplan:

  1. Extracting all the different structs implementing Shape to a different module (shapes.go?) so it will be clear which shapes do we have.
  2. Add a Name() string method to the Shape interface to explicitly get the name of the shape.
  3. Implement serialization to JSON-LD of the data the shapes hold that are not yet serializable
  4. Implement parse(json_ld) Shape function that takes JSON-LD structure and returns a Shape object
  5. Wire it to /query API as an acceptable Content-Type: application/ld+json
  6. Implement a JavaScript client that can generate such shape
iddan commented 5 years ago

I know how to do 2, 3, 4, 5 and 6. 1 is a little more tricky without previous knowledge.

dennwc commented 5 years ago

How do you like CBOR as serialisation format?

Yeah, I thinking about using either CBOR or Protobuf. Both have upsides and downsides.

Another option is to use Gremlin's Bytecode format.

I think this will limit us in the long run. We can support that separately if we want to be compatible with Gremlin clients, though.

Workplan:

  1. Is already done, see shape package. This was my goal for the last few cycles.

  2. That's an easy part :) However the implementation you describe is targeting CBOR/JSON-LD specifically. Protobufs on the other hand may be more efficient for the case when you have a limited set of allowed messages, which is the case with Shapes. MarshalProto/UnmarshalProto may be added to the interface to support it.

  3. This will work for simple cases, but not for advanced ones. Imagine a shape holding a pointer to a function. Or a reference to an external data source. I would propose to reject unknown shapes for serialization (return error if attempted). Also, if we want to target JSON-LD specifically here, we should consider dumping our own schema first to see if it works for us or not.

  4. Again, let's not jump to JSON-LD for queries just yet :) It's a great option in the long run, since we can store them in DB that way (see #669), but the server needs to do a lot of work to interpret such query if it comes from HTTP. JSON-LD spec is pretty involved in terms of possible values, forms, etc. Again, I would propose to accept Protobufs with a strict schema first to see what works and what isn't. And then design a solution with JSON-LD in mind as the next step, or as a step toward #669.

iddan commented 5 years ago

Hm... Isn't strict unmarshalling JSON a solved problem in Go? https://gobyexample.com/json (I really don't know as I'm fairly new to the language) I don't feel very comfortable to add Proto messages as our data types story is pretty complex already and we will need to add it as a dependency for clients, while JSON-LD will not require additional tooling. In second thought we can wait with CBOR and just start with bare JSON and then add CBOR option as they have compatible datatypes. Compatibility with Gremlin shouldn't be a goal right now. I just think we can learn from their structures. In a trade-off of simplicity and performance for messaging format, I personally prefer simplicity as the messages are rather small and performance differences are minor.

dennwc commented 5 years ago

JSON-LD will not require additional tooling

For the client side, yes, since it will just emit it. But for server side it's a different story. If you compare it to regular JSON, the number of steps necessary to interpret it may make it sub-optimal as a primary format for the queries.

we can wait with CBOR

Adding CBOR is easy if we already support JSON. It also gives a good performance boost in terms of decoding, so I would prefer it in the long run. But yes, we can start with JSON for now.

I don't feel very comfortable to add Proto messages as our data types story is pretty complex already

I would say Protos usually simplify the datatypes story a lot, since you get everything auto-generated. Plus they are easier to interpret and decode because the strict schema is baked in the format itself, as opposed to JSON/CBOR which are schema-less by design.

But yes, for JS it's needlessly painful for some reason, which is really a shame. I usually end up writing a proto decoder in JS by hand by translating the generated Go code. All of this is just to avoid dependencies :( So we will have to support JSON anyway, I guess. Hope the story with CBOR is better than with Protos at least.

I will look into server-side support for things like OpenAPI to see if it will be useful in this case.

iddan commented 5 years ago

By JSON-LD I was primarily talking about RDF terms representation (as we’ve done in Gizmo). Everything else is just regular JSON

iddan commented 4 years ago

To generate JSON validation we can use something like https://github.com/alecthomas/jsonschema

dennwc commented 4 years ago

By JSON-LD I was primarily talking about RDF terms representation (as we’ve done in Gizmo). Everything else is just regular JSON

This certainly works for serialization, but not for the deserialization, as mentioned above.

To generate JSON validation we can use something like https://github.com/alecthomas/jsonschema

Was thinking about using this exact library as well :)