JasperFx / marten

.NET Transactional Document DB and Event Store on PostgreSQL
https://martendb.io
MIT License
2.85k stars 450 forks source link

V7 Select before Where no longer working #3009

Open CptWesley opened 7 months ago

CptWesley commented 7 months ago

Hi. We are currently working on upgrading from V6 to V7 but are running into the following issue.

Given the following document definition:

public sealed class Foo
{
    public string Id { get; set; }

    public Bar Inner { get; set; }
}

public sealed class Bar
{
    public int Value { get; set; }
}

We then run the following queries:

using (var storeSession = store.LightweightSession())
{
    storeSession.Store(new Foo
    {
        Id = "foo",
        Inner = new()
        {
            Value = 100,
        },
    });

    await storeSession.SaveChangesAsync();
}

using var querySession1 = store.QuerySession();
using var querySession2 = store.QuerySession();

var r1 = await querySession1
    .Query<Foo>()
    .Where(x => x.Inner.Value < 50)
    .ToListAsync();

var r2 = await querySession2
    .Query<Foo>()
    .Select(x => x.Inner)
    .Where(x => x.Value < 50)
    .ToListAsync();

Now r1 is an empty list as expected, however r2 contains the inserted document with id foo. In practice we find that this form of querying returns all documents in the document store of that type. When we examine the generated query we find:

select d.data -> 'inner' as data from public.mt_doc_repository_specs_foo as d;

It appears the .Where clause after the .Select is ignored.

In the previous version of Marten (6.4.1) this way of querying worked fine. I am aware that this major version has changed the way Linq queries are constructed, so I'm not sure if this change was intentional.

For completion sake, this is the query that was generated in 6.4.1:

select d.data ->> 'inner' as data from public.mt_doc_repository_specs_foo as d where CAST(d.data -> 'inner' ->> 'value' as integer) < :p0
mysticmind commented 7 months ago

Just as a test, can you please switch the order of Where and Select so that select is after where in the second query and let me know what happens?

CptWesley commented 7 months ago

@mysticmind That one does appear to be working:

var r1 = await querySession
    .Query<Foo>()
    .Where(x => x.Inner.Value < 50)
    .ToListAsync();

// []

var cmd1 = querySession
    .Query<Foo>()
    .Where(x => x.Inner.Value < 50)
    .ToCommand(FetchType.FetchMany).CommandText;

// select d.data from public.mt_doc_repository_specs_foo as d where CAST(d.data -> 'inner' ->> 'value' as integer) < :p0;

var r2 = await querySession
    .Query<Foo>()
    .Select(x => x.Inner)
    .Where(x => x.Value < 50)
    .ToListAsync();

// ["foo"]

var cmd2 = querySession
    .Query<Foo>()
    .Select(x => x.Inner)
    .Where(x => x.Value < 50)
    .ToCommand(FetchType.FetchMany).CommandText;

// select d.data -> 'inner' as data from public.mt_doc_repository_specs_foo as d;

var r3 = await querySession
    .Query<Foo>()
    .Where(x => x.Inner.Value < 50)
    .Select(x => x.Inner)
    .ToListAsync();

// []

var cmd3 = querySession
    .Query<Foo>()
    .Where(x => x.Inner.Value < 50)
    .Select(x => x.Inner)
    .ToCommand(FetchType.FetchMany).CommandText;

// select d.data -> 'inner' as data from public.mt_doc_repository_specs_foo as d where CAST(d.data -> 'inner' ->> 'value' as integer) < :p0;
jeremydmiller commented 7 months ago

It's not intentional, but no, just no. We're not going to support a Where() chained after a Select(). I think it's a happy accident that worked at all in < 7.0. We'll "fix" this with a validation error saying "don't do that"

CptWesley commented 7 months ago

I'm not entirely sure why you wouldn't want to support such queries. We work a lot with (deep) nested documents. Being able to "unpeel" the first few layers is very useful for making understandable queries. I'm also not really sure of any technical/performance objections to supporting this. So if you could explain to me why you wouldn't want to support this, that would be appreciated.

jeremydmiller commented 7 months ago

The Where() after the Select() applies to after the transformation. We could technically do that, but it would throw you into using CTEs that are pretty slow. You'd be much, much better off using the Where() on the initial document collection

This is a lot more work than it might look like from the outside.

CptWesley commented 7 months ago

Couldn't the Selects be "hoisted" (or more accurately, whatever the opposite of hoisting is) to the end of the query by some form of query planner?

Given some definition:

class A {
  B B { get; set; }
  int Value1 { get; set; }
}

class B {
  C C { get; set; }
  int Value2 { get; set; }
}

class C {
  int Value3 { get; set; }
}

An IQueryable of the form:

query
  .Where(a => a.Value1 > 5)
  .Select(a => a.B)
  .Where(b => b.Value2 > 10)
  .Select(b => b.C)
  .Where(c => c.Value3 > 20)

Can be rewritten as:

query
  .Where(a => a.Value1 > 5)
  .Where(a => a.B.Value2 > 10)
  .Where(a => a.B.C.Value3 > 20)
  .Select(a => a.B)
  .Select(b => b.C)

or more condensed:

query
  .Where(a => a.Value1 > 5 && a.B.Value2 > 10 && a.B.C.Value3 > 20)
  .Select(a => a.B.C)

Semantically these are all equivalent (I think).

jeremydmiller commented 7 months ago

If you want to have the relatively inefficient CTE query style support to make this work, that's not horrible to do. If you want this to generate more efficient SQL by "hoisting" the query planning somehow, that's a ton more work and I think you're veering into some pretty serious "I take pull requests" territory here. I'd be happy to talk about a paid JasperFx engagement to do that, but I don't think you'd get enough value out of it to justify the cost.

CptWesley commented 7 months ago

For further context:

If I look at the query generated by 6.4.1 for the first example above:

query
    .Where(a => a.Value1 > 5)
    .Select(a => a.B)
    .Where(b => b.Value2 > 10)
    .Select(b => b.C)
    .Where(c => c.Value3 > 20)

The query generated by 6.4.1 is:

select d.data -> 'b' ->> 'c' as data from public.mt_doc_repository_specs_a as d where (CAST(d.data ->> 'value1' as integer) > :p0 and  CAST(d.data -> 'b' ->> 'value2' as integer) > :p1 and  CAST(d.data -> 'b' -> 'c' ->> 'value3' as integer) > :p2)

Which seems to do the hoisting (albeit potentially unintentionally).

CptWesley commented 7 months ago

If this would be something you would want to support, I could potentially look into it in the future (no promises though).

jeremydmiller commented 7 months ago

It might have been some kind of automatic thing that Relinq did for us that I wasn't aware of. It's not a set of use cases I would have even thought to have tested.

As for supporting it, sure, it's just not something I can prioritize anytime soon