DataObjects-NET / dataobjects-net

https://dataobjects.net
MIT License
60 stars 23 forks source link

Wrong join/leftjoin sql translation (key expression is variable, join same table) #323

Closed letarak closed 4 months ago

letarak commented 1 year ago

Hello DO 7.0.3

If join/left use same table and key expression is variable then sql query translation result is wrong

Example:

using System.Data.SqlClient;
using DoTest;
using NUnit.Framework;
using Xtensive.Orm;
using Xtensive.Orm.Configuration;

internal class Program
{
    private static async Task Main(string[] args)
    {
        try
        {
            DbHelper.ExecuteNonQuery("DROP DATABASE [DO-Tests]");
        }
        catch (Exception)
        {
        }

        DbHelper.ExecuteNonQuery("CREATE DATABASE [DO-Tests]");

        var currentConnection = new SqlConnectionStringBuilder(DbHelper.ConnectionString());

        var dc = new DomainConfiguration("sqlserver", currentConnection.ToString());

        dc.Types.Register(typeof(TestEntity));

        dc.UpgradeMode = DomainUpgradeMode.Recreate;

        await using var d = await Domain.BuildAsync(dc);

        await using (var s = await d.OpenSessionAsync())
        await using (var t = await s.OpenTransactionAsync())
        {
            _ = new TestEntity(s) { Name = "1", Description = "test", Text = "text" };
            _ = new TestEntity(s) { Name = "1", Description = "test", Text = "text" };

            t.Complete();
        }

        System.Linq.Expressions.Expression<Func<TestEntity, bool>> filter = it => it.Description == null;

        System.Linq.Expressions.Expression<Func<TestEntity, int>> key = it => it.Id;

        await using (var s = await d.OpenSessionAsync())
        using (s.Activate())
        await using (await s.OpenTransactionAsync())
        {
            /*SELECT DISTINCT
    [a].[Id]
FROM [dbo].[TestEntity] [a]
    LEFT OUTER JOIN
    (
        SELECT [b].[Id] AS [#a.Id],
               282 AS [#a.TypeId],
               [b].[Description] AS [#a.Description],
               [b].[Text] AS [#a.Text]
        FROM [dbo].[TestEntity] [b]
        WHERE ([b].[Description] IS NULL)
    ) [c]
        ON ([a].[Id] = [c].[#a.Id])
WHERE ([a].[Text] IS NOT NULL);*/
            var leftJoinWithExpression = Query.All<TestEntity>().LeftJoin(
                    Query.All<TestEntity>().Where(filter),
                    o => o.Id,
                    i => i.Id,
                    (o, i) => o)
                .Where(it => it.Text != null)
                .Select(it => it.Id)
                .Distinct()
                .ToList();

            Assert.AreEqual(2, leftJoinWithExpression.Count);

/*SELECT DISTINCT
    [a].[Id]
FROM [dbo].[TestEntity] [a]
    LEFT OUTER JOIN
    (
        SELECT [b].[Id] AS [#a.Id],
               282 AS [#a.TypeId],
               [b].[Description] AS [#a.Description],
               [b].[Text] AS [#a.Text]
        FROM [dbo].[TestEntity] [b]
        WHERE ([b].[Description] IS NULL)
    ) [c]
        ON ([a].[Id] = [c].[#a.Id])
WHERE (
          ([a].[Description] IS NULL)
          AND ([a].[Text] IS NOT NULL)
      );*/
            var leftJoinWithExpressionVariable = Query.All<TestEntity>().LeftJoin(
                    Query.All<TestEntity>().Where(filter),
                    key,
                    key,
                    (o, i) => o)
                .Where(it => it.Text != null)
                .Select(it => it.Id)
                .Distinct()
                .ToList();

            Assert.AreEqual(2, leftJoinWithExpressionVariable.Count);

/*SELECT DISTINCT
    [a].[Id]
FROM [dbo].[TestEntity] [a]
    INNER JOIN [dbo].[TestEntity] [b]
        ON ([a].[Id] = [b].[Id])
WHERE (
          ([b].[Description] IS NULL)
          AND ([a].[Text] IS NOT NULL)
      );*/
            var joinWithExpression = Query.All<TestEntity>().Join(
                    Query.All<TestEntity>().Where(filter),
                    o => o.Id,
                    i => i.Id,
                    (o, i) => o)
                .Where(it => it.Text != null)
                .Select(it => it.Id)
                .Distinct()
                .ToList();

            Assert.AreEqual(2, joinWithExpression.Count);

/*SELECT DISTINCT
    [a].[Id]
FROM [dbo].[TestEntity] [a]
    INNER JOIN [dbo].[TestEntity] [b]
        ON ([a].[Id] = [b].[Id])
WHERE (
          ([a].[Description] IS NULL)
          AND ([b].[Description] IS NULL)
          AND ([a].[Text] IS NOT NULL)
      );*/
            var joinWithExpressionVariable = Query.All<TestEntity>().Join(
                    Query.All<TestEntity>().Where(filter),
                    key,
                    key,
                    (o, i) => o)
                .Where(it => it.Text != null)
                .Select(it => it.Id)
                .Distinct()
                .ToList();

            Assert.AreEqual(2, joinWithExpressionVariable.Count);
        }
    }

    [HierarchyRoot]
    public class TestEntity : Entity
    {
        public TestEntity(Session session) : base(session)
        {
        }

        [Key] [Field(Nullable = false)] public int Id { get; set; }

        [Field(Nullable = false)] public string Name { get; set; }

        [Field] public string? Description { get; set; }

        [Field] public string? Text { get; set; }
    }
}
alex-kulakov commented 4 months ago

Hello @letarak,

The issue exists but it seems so rare because join of the same table is very rare and reuse of the same lambda of getting joining fields makes it even more unique.

The thing is that we use parameters as key value to store visited left and right subtrees and then refer to them later on. For example, here's some code from Join method visitor

    var outerParameter = outerKey.Parameters[0];
    var innerParameter = innerKey.Parameters[0];
    var outerSequence = VisitSequence(outerSource);
    var innerSequence = VisitSequence(innerSource);
    using (context.Bindings.Add(outerParameter, outerSequence))
    using (context.Bindings.Add(innerParameter, innerSequence)) {
      //....
    }

In your case innerParameter is the same as outerParameter so on adding bindings in dictionary one value overrides another.

The only choice I have is to clone the LambdaExpression because we need parameter as key in key-value collection for certain operations. And it not only kills resources you try to save by reusing expression but also makes translation slower because it will cost some CPU resources to visit and clone the LambdaExpression.

What did you try to achieve by re-using lambda expression?

alex-kulakov commented 4 months ago

By the way, I noticed something that bothers me in the code you provided and can't pass it by, I hope it was done so only in test as a fast demonstration and has nothing to do with real usage.

Please, DON'T USE session activations in asynchronous code, and in any parallel/multithreaded execution scenarios. It may bring you to a nightmare once it will start going off rails.

The thing is that asynchronous code may switch thread it works on, sometimes many times during execution. Basically, any await usage is a chance that the code below the await will be on another thread. I believe you know this. What you may not know is details of session activation. We've changed it to be not thread-bound, instead it uses ThreadLocal but it's not a silver bullet, sometimes it fails to restore back the value which was set. We have faced this issue in our Reprocessing extension and forced to give the user a way to avoid activations.

So, I strongly recommend you remove session.Activate() usages in every case that uses asynchronous methods, at least do not wrap parts of code containing await like so

    await using (var s = await d.OpenSessionAsync())
    using (s.Activate())
    await using (await s.OpenTransactionAsync())

It is save to activate session for regions of synchronous code like

    await using (var s = await d.OpenSessionAsync())
    await using (await s.OpenTransactionAsync()) {
       //some async code
       using (s.Activate()) {
         // region with no async code
       }
       //some more async code
    }

Since static class Query works by using activations, it also not very compatible with asynchronous code, and recommended to not use in such cases.

I wish you the best and to not having hard-to-identify thread issues, keep coding 😉

letarak commented 4 months ago

Yes, session activation is old testing code and not used in production

"What did you try to achieve by re-using lambda expression?" "resources you try to save"

We do not try to save resources In this case, was written method to find duplicates (key expression is argument) Join same table, key expression equals Decision was do not call method with two identical expressions (it is shorter signature and call, plus avoid possibility to call with different expressions after some refactoring)

Agreed, it can view strange, but in my opinion bug not so much in absence of valid translation, but in silence Without exception, can be difficult to diagnose problem in time

At now we have roslyn analyzer to check our code, but, if you ask me, it is bug and can be fixed by throw exception in runtime

By the way, have you thought about supplying ORM specific roslyn analyzers as a package? with best practice from ORM developers :)

alex-kulakov commented 4 months ago

Yes, session activation is old testing code and not used in production

Thanks, god 😁

Agreed, it can view strange, but in my opinion bug not so much in absence of valid translation, but in silence without exception, can be difficult to diagnose problem in time

The issue exists, it's a fact. Your case is a boundary case but it is a valid case, so I agree that it should be addressed somehow.

While I was waiting for your answer, I did some research. It can also be fixed by creating a new parameter and replacing old parameter with the new one for either inner or outer selector lambda expression. It works, but my main concern is that this small adjustment will happen every time such query executes, especially, if it can be avoided easily - compile-time solution is better that multiple runtime adjustments. :) In modern versions of language there is static modifier for lambdas which contributes to performance, I'm sure you know it.

If you say that you are OK with exception, probably I would prefer it to not waste compute time.

Speaking of,

By the way, have you thought about supplying ORM specific roslyn analyzers as a package? with best practice from ORM developers :)

I'm the only person on the project who manages client support, pull-request reviews, bug investigation and fixes and so on. I don't want to diminish impact of external contributors, they definitely help to enrich the project with some features and improvements, but their PRs should be at least reviewed carefully. Even if I would like to, my time is not limitless, with all the desire.

Do you have an experience in creating Roslyn analyzers? If you have time and will for contribution so it would be a win-win, I'm open for discussion.

letarak commented 4 months ago

We widely use analyzers in our projects and I have some amateurish experience It is have some subjective stylistic/architecture view on analyzer/tests, but after first analyzer would be written and accepted, it would be much easier to write ten more :)

No promises, but I hope to create PR in near future

alex-kulakov commented 4 months ago

Ok, please notify me in advance, we will probably have to discuss specifics outside Github :)

Speaking of original issue fix, I'm going to address it in 6.0 branch, the PR will appear here.

alex-kulakov commented 4 months ago

The issue is addressed