bchavez / RethinkDb.Driver

:headphones: A NoSQL C#/.NET RethinkDB database driver with 100% ReQL API coverage.
http://rethinkdb.com/api/java
Other
384 stars 134 forks source link

Make chunks of the AST serializable. #68

Closed bchavez closed 8 years ago

bchavez commented 8 years ago

Important chat from slack to avoid losing history on the subject.

@bchavez: Hello, I sort of have an odd question regarding Func and the scope of VarId. It refers to a question proposed to me last night: https://rethinkdb.slack.com/archives/general/p1466483600000840

The basic idea is taking a ReQL expression, (like ReqlFunction1 filter = expr => expr["Bar"].Gt(2);) seralizing it into a string, sending it over a network to a server, then deserializing the string back into something that can be used in the final query across an application boundary.

I was able to pull this off with some hackery, but then I remembered the C# and Java driver keep VarId state in the driver which is the crux of my questions.

A more concrete example:

ReqlFunction1 filter = expr => expr["Bar"].Gt(2);
var str = ReqlRaw.ToRawString(filter);
// str = "[69,[[2,[1]],[21,[[170,[[10,[1]],"Bar"]],2]]]]"
var filterTerm = ReqlRaw.FromRawString(str);
var result = table.Filter(filterTerm).RunResult<List<Foo>>(conn);

In somewhat human readable form, the serialized expression looks like:

var str =
[FUNC,[[MAKE_ARRAY,[1]],[GT,[[BRACKET,[[VAR,[1]],"Bar"]],2]]]]

filterTerm in the final query is of type ReqlRaw. ReqlRaw a unique pseudo term that injects the raw protocol string into the final query. So, the final query looks like this:

[FILTER,[[TABLE,[[DB,[\"query\"]],\"test\"]],<THAT_RAW_REQL_GETS_DROPPED_HERE>]],{}

And it works yo. :scream_cat: And so, some questions about that VAR,[1]... because I can foresee some danger ahead...

  1. What is the scope of var,[1] within the FUNC term only? or is VAR scoped over the entire query?
  2. Suppose I did two Filters:
var result = table
                  .Filter(filterTerm)
                  .Filter(otherFilterTerm)
                  .RunResult<List<Foo>>(conn)

otherFilterTerm could have VAR,[1] same as filterTerm also, but would that interfere with the first filterTerm?

  1. Is VAR only named by integer? or could it be a named variable? Like: [VAR,["someNameInsteadOfINT"]]?

I guess what I'm trying to understand the danger if FUNC VAR ids can interfere with each other if two expressions are used within the same query (but in different AST terms). Eah. I don't know if I'm making any sense.

Cool driver feature nonetheless, if we can pull it off without danger.

@mlucy [12:51 PM]

@bchavez: Variables are only scoped inside the function that introduces them. Two variables with the same number in totally different parts of the AST shouldn't interfere with each other. We generally recommend that drivers generate unique numbers for every variable, though, because it makes variable shadowing problems totally impossible.

If you have a self-contained chunk of ReQL which doesn't reference any unbound variables, it should be safe to drop that into another query.

@bchavez [12:56 PM]

Heck yeah. Cool. Thanks. :+1: I'm not totally sure what variable shadowing is, but kinda get the feeling it would only happen in really complex lambda scenarios? I think it might be pretty cool to offer this with a big warning label? :slightly_smiling_face:

@mlucy [1:00 PM]: The basic problem is if someone does something like: (ruby code, x is ReqlExpr)

def is_forbidden(x)
  r.expr([1, 2, 3]).contains{|number| x.eq(number))}
end

And then they write:

sequence.filter{|n| is_forbidden(n).not()}

If sequence and contains both generated the same number for their variable, this wouldn't do what you expect.

@bchavez [1:03 PM]

ahhhh. ahh. ok. I think I see what your saying... dang.

@mlucy [1:03 PM]

But if you have a chunk of ReQL with no unbound variables, it isn't a problem. So, in this example x is a chunk of ReQL with an unbound variable (because it's just a variable with nothing above binding it). So serializing x over the network would be wildly unsafe. But serializing the whole sequence.filter would be. (Safe, that is.)

@bchavez [1:14 PM]

@mlucy: Alrighty. Maybe the safest way I can do this is maybe treat Funcs tiny bit differently. So, when the user calls ReqlRaw.ToRawString(filter) to serialize the ReQL AST, with some pre-processing, I can translate VAR,[1] to something like VAR,["RAW_...UUID..."] a special format. So when the query is brought back to life (var filterTerm = ReqlRaw.FromRawString(str);) from a string VAR,["RAW_SomeUUID."] (and any of its occurrences) gets translated back into VAR,[N] where N would be native to that application boundary in the driver. I ​think​ that would make it safe again without the shadowing problem? Also, I ​think​ I would only need to do this for Func/Var right? Almost seems like a named VAR with a UUID, but when it's time to get used, I zap them back into N (which auto increments as it's used normally does in the Java/C# driver).

@mlucy [5:47 PM]

@bchavez: changing the vars to UUIDs and then back into globally unique numbers on the other end of the wire should work fine.

ORIGINAL USE CASE:

@epsitec [9:33 PM]

@bchavez I am trying to figure out how to build a ReqlAst in one layer of my software and have it execute agains RethinkDb.Driver on another layer. I need to be able to serialize the AST. There seems to be a Build() method (not exposed publicly) which produces JSON. But then, how would I deserialize that into a new instance of ReqlAst to feed it to a Filter() for instance?

Say I have this expression: x => x["Age"].Gt (18) and I want to send it over a ​serialization boundary​. How should I proceed?

@bchavez [9:38 PM]

Hey @epsitec. Just out of curiosity, could you pass by reference the ReqlExpr object instance term without having to ser/dez the query? Seems like ser/dez is a lot of work just to pass from one layer to another.

@epsitec [9:42 PM]

No chance, alas. We need to go over process (and possibly machine) boundaries.

@bchavez [9:52 PM]

@epsitec: There's no intermediary you could use to postpone building the ReqlAst until it's actually needed to run? There is no easy way to serialize to a string... then build up the AST object model again from the serialized string query. The C#/Java driver have a very specific one-way format when serialized over the wire. The fidelity of the AST when deserializing would not be complete..... The ​best​ the driver could do is allow you to serialize to a string so you could send the query as a string cross boundary... and if you wish to parameter the query further at the other side of the boundary you'll have to use string.Replace("7777", "withsomething") if your expression was x => x["Age"].Gt(7777)....

@bchavez: would you think it possible to inject the string into the Filter() method? (edited)

@bchavez [10:00 PM]

@epsitec: as a token replacement for x => x["Age"].Gt(18)?

@epsitec [10:00 PM]

Yes

@epsitec [10:01 PM]

@bchavez: If the serialization would not work from AST ⇒ Filter() we'd have to write a parser for the various filtering scenarios we'll meet, and build the AST locally in the data layer. (edited)

@bchavez [10:03 PM]

@epsitec: You might want to try it. I don't know off the top of my head if that would work or not.... but it does seem though Filter does take .Filter( Javascript js ).... You might be able to use that overload to write a javascript predicate. (edited)

@epsitec [10:05 PM]

Yes, Filter(Javascript js) was an option, but we discarded it since we don't go the JavaScript route. So for now, there's no silver bullet...

@bchavez [10:35 PM]

How's this @epsitec

public void can_stich_together_some_crazy_reql_expr_thing()
{
    ClearDefaultTable();
    var foos = new[]
        {
            new Foo {id = "a", Baz = 1, Bar = 1, Idx = "qux"},
            new Foo {id = "b", Baz = 2, Bar = 2, Idx = "bub"},
            new Foo {id = "c", Baz = 3, Bar = 3, Idx = "qux"}
        };
    R.Db(DbName).Table(TableName).Insert(foos).Run(conn);
    ReqlFunction1 filter = expr => expr["Bar"].Gt(2);
    var str = ReqlRaw.ToRawString(filter);
    str.Dump();
    var filterTerm = ReqlRaw.FromRawString(str);
    var result = table.Filter(filterTerm).RunResult<List<Foo>>(conn);
    ExtensionsForTesting.Dump(result);
}

OUTPUT:

[
  {
    "id": "c",
    "Bar": 3,
    "Baz": 3,
    "Idx": "qux",
    "Tim": null
  }
]

@bchavez [11:08 PM]

@epsitec: Yeah, hm. That's not going to work.... Especially with ReqlFunctions because there's internal state like Func VarId that needs to be incremented every time a function is created. So, if you have this cross-application boundary, it's possible to get conflicting VarIds in both boundaries.

bchavez commented 8 years ago

@epsitec: New release of the C# driver now allows you to serialize ReQL Expressions. v2.3.6-beta-1 now available on ​NuGet​.

Driver documentation on the new feature is here: https://github.com/bchavez/RethinkDb.Driver/wiki/Extra-C%23-Driver-Features#serializing-reql-expressions. :v:

epsitec commented 8 years ago

@bchavez thank you again. We build the package from source code; we currently need a signed assembly to integrate it into our project, so it's easier to do that from source rather than post-signing the NuGet output.