couchbaselabs / Linq2Couchbase

A Language Integrated Query (LINQ) provider for the Couchbase .NET SDK
Apache License 2.0
94 stars 48 forks source link

Named parameter / prepared query support #275

Open olee opened 5 years ago

olee commented 5 years ago

Hi, would it be possible to integrate support for named / positional parameters, mainly to boost performance by utilizing query plan caching / prepared queries?

I could imagine something like this:

var emailParam = new QueryParameter("test@example.com");
var data = userCtx.Query<User>()
    .Where(x => x.Email == emailParam.Value)
    .Select(u => new
    {
        FirstName = u.FirstName,
        LastName = u.LastName,
    })
    .UsePreparedQuery("my-prepared-query") // maybe name can be optional? 
                                            // it might be possible to generate a unique name by hashing the expression tree or something like that?
    .ToList();

The QueryParameter class could be detected in the expression visitors and it would automatically generate a named parameter at its position and add its value to the final query with AddNamedParameter.

Additionally the UsePreparedQuery extension would mark the generated query to be registered as a prepared query with couchbase to improve performance for future executions.

I'm quite new to couchbase but have some knowledge regarding linq expression trees, so I hope this proposal makes sense.

MikeGoldsmith commented 5 years ago

Thanks @olee for the suggestion.

We'll take a look and see if it's possible to utilise query parameters / prepared queries in Linq2Couchbase.

brantburnett commented 3 years ago

The first part of the problem is that we need a clear way within LINQ to indicate which values in the predicate are constants versus which will change dynamically between each query. In some cases, we can recognize a pure constant within the query. However, in many cases, we can't. The underlying Relinq infrastructure converts many variable inputs into ConstantExpressions very early in the process as a key optimization.

This means that we don't have a clear way to know which fields to place directly on the query, i.e. WHERE type = 'doc_type' versus which fields to parameterize i.e. AND id = $id.

At first glance, you would then assume we could just parameterize all the constants on the query, assuming that any of them may be changed between query executions, and then prepare the query for reuse. However, this introduces problems related to Couchbase indices.

One of the parts of planning a query, including generating a saved query plan for reuse, is selecting the index or indices to use to support the query. Couchbase indices, unlike SQL indices, have predicates attached to the index itself. This predicate filters the documents which are included on the index, i.e. CREATE INDEX my_index ON default (id) WHERE type = 'doc_type'. As a result, in order to use this index, the entire predicate must also apply to the query, i.e. SELECT * FROM default WHERE type = 'doc_type' AND id = 1.

However, if all of the constants are parameterized, then the query planner cannot recognize that the index matches. For example, SELECT * FROM default WHERE type = $type AND id = $id cannot match the index example above. This will cause the query plan to either fail altogether or, perhaps worse, choose a non-optimal index that doesn't have any predicates.

The only idea I have to address this would be a special method that flags a particular constant as intended to be dynamic, similar to the example in the original message.

var data = userCtx.Query<User>()
    .Where(x => x.Email == N1Ql.Parameter("test@example.com"))
    .Select(u => new
    {
        FirstName = u.FirstName,
        LastName = u.LastName,
    });

In this case, the static function would be a passthrough for .NET purposes, but we'd be able to recognize it during query generation:

public static T Parameter<T>(T value) => value;

Wherever we encounter this method call, we'd apply the constant value to the query as a named parameter. I think we can avoid the need for a special method like UsePreparedQuery. We can simply recognize that the query has at least one Parameter call and then we know to specify AdHoc = false automatically. The Couchbase SDK internally tracks prepared queries and reuses them based on having the same query text.

I see two downsides to this approach:

  1. Another esoteric method on our API surface that isn't immediately clear at a glance (i.e. it could be easily misused and cause index selection problems)
  2. There won't be a way at compile time to require that the value passed to N1Ql.Parameter be a constant (or reducible to a constant). If used incorrectly, this could cause run time exceptions. These exceptions would also be missed by most unit testing scenarios using a mock IBucketContext.