BlueshiftSoftware / EntityFrameworkCore

NoSQL providers for EntityFramework Core
Other
281 stars 57 forks source link

Multiple DbCntexts select query can't work concurrently. #29

Open IvanZheng opened 5 years ago

IvanZheng commented 5 years ago

Multiple DbContexts select query can't work concurrently. Please see the code below. BTW, if I use async query, "System.ObjectDisposedException : Cannot access a disposed object." will occur. `

    public async Task ConcurrentTest()
    {
        var tasks = new List<Task>();
        for (int i = 0; i < 20; i++)
        {
            var j = i;
            tasks.Add(Task.Run(() => InternalGetUsersTest(j)));
        }
        await Task.WhenAll(tasks);
    }

    private void InternalGetUsersTest(int i)
    {
        var builder = new DbContextOptionsBuilder().UseMongoDb("mongodb://10.100.7.46:9007/DemoDb");
        using (var dbContext = new DemoDbContext(builder.Options))
        {
            var database = dbContext.GetMongoDbDatabase();
            // !!! 20 concurrent tasks cost 3 seconds.
            var user1 = database.GetCollection<User>("users")
                                .Find(new ExpressionFilterDefinition<User>(u => u.Id == "xxx"));

            // !!! 20 concurrent tasks cost almost 20 seconds. It seems each task run one by one.
            var user2 = dbContext.Users
                                 .FirstOrDefault(u => u.Id == "xxx");
        }
    }`
crhairr commented 5 years ago

I assure you, there are tests on the sync and async queries.

I don't know that the EF Core system is designed to handle multiple concurrent queries this way. Does this work with the SQL providers? The problem may very well likely be in the state manager.

I'll definitely take a look at this. It would make things easier for me if you fork the repo and create a simple test using the sample ZooContext and domain that illustrates this problem. Otherwise, it might be a while before I can get to it.

IvanZheng commented 5 years ago

Hi @crhairr , thank you for your reply. I have forked your repo and committed two test cases in MongoDbContextTests.

About the concurrent query issue, I can reproduce it in netcoreapp2.1 and framework4.6.1 is ok. Please try the code. I have to use the sync query in this case due to the async query issue. `

    [Fact]
    public async Task Concurrent_query()
    {
        var tasks = new List<Task>();
        var batchCount = 20;
        for (var i = 0; i < batchCount; i++)
        {
            tasks.Add(ExecuteUnitOfWorkAsync(zooDbContext => Task.Run(() => zooDbContext.Employees
                                                                                        .FirstOrDefault(e => e.FirstName == $"{DateTime.Now.Ticks}"))));
        }
        await Task.WhenAll(tasks);
    }

`

About the async query issue, I can easily reproduce it with the second same query. You can try the code below. After some investigation I fixed it with some adjustments in MongoDbEntityQueryModelVisitor.ReplaceClauseReferences and MongoDbDatabase.CompileAsyncQuery, but I am not sure if it has side affect. The commit is fix async query twice issue. `

    [Fact]
    public async Task Can_list_async_twice()
    {
        await ExecuteUnitOfWorkAsync(async zooDbContext =>
        {
            zooDbContext.Animals.AddRange(_zooEntities.Animals);
            Assert.Equal(
                         _zooEntities.Entities.Count,
                         await zooDbContext.SaveChangesAsync(acceptAllChangesOnSuccess: true));
        });

        await ExecuteUnitOfWorkAsync(async zooDbContext =>
        {
            Assert.Equal(_zooEntities.Animals,
                         await zooDbContext.Animals
                                           .OrderBy(animal => animal.Name)
                                           .ThenBy(animal => animal.Height)
                                           .ToListAsync(),
                         new AnimalEqualityComparer());
        });

       // throw System.ObjectDisposedException : Cannot access a disposed object.
        await ExecuteUnitOfWorkAsync(async zooDbContext =>
        {
            Assert.Equal(_zooEntities.Animals,
                         await zooDbContext.Animals
                                           .OrderBy(animal => animal.Name)
                                           .ThenBy(animal => animal.Height)
                                           .ToListAsync(),
                         new AnimalEqualityComparer());
        });

`

crhairr commented 5 years ago

Ultimately, this error is due to the using statement in ExecuteUnitOfWorkAsync. If I remove the using and allow the IServiceScope to be disposed of by the GC, the code works fine. The problem appears to be that the second Task is returning asynchronously and actively disposing the IServiceScope - and all of its managed services - before the work is actually complete.

I'm investigating whether or not there is a way around this. However, this may be a limitation beyond my immediate control.

IvanZheng commented 5 years ago

I have adjusted concurrent query test. The more records in mongodb, the more time query costs. It seems like that each query pull many records from mongodb. I also tried mongodatabase to query and it costed a constant time.

After setting profile level to 2, I find there is no filter or pipeline transfered to mongodb. I think it will get all records from mongodb and filter in the memory.

@crhairr, could you please help to investigate this issue first? It blocks my project now. Thank you! I have commited test code to my fork repo. Adjust concurrent query test

Reproduce step:

  1. Insert 60,000 employees into mongodb.
  2. 100 tasks concurrently query just one record by zooDbContext.Employees.FirstOrDefault.

`

    [Fact]
    public async Task Concurrent_query()
    {
        var tasks = new List<Task>();
        var batchCount = 100;
        for (var i = 0; i < batchCount; i++)
        {
            tasks.Add(ExecuteUnitOfWorkAsync(zooDbContext => Task.Run(() =>
            {
                // Total test case Cost 4 seconds no matter how many records in mongodb 
                //var employee = GetMongoDbDatabase(zooDbContext).GetCollection<Employee>("employees")
                //                                               .Find(e => e.FirstName == $"{DateTime.Now.Ticks}").FirstOrDefault();

                // Total test case Cost almost 26 seconds if there are 60000 records in mongodb.
                // It seems like that each query pull many records from mongodb.
                // The more records in mongodb, the more time cost.

                var mongoDatabase = GetMongoDbDatabase(zooDbContext);
                mongoDatabase.RunCommand<BsonDocument>(new BsonDocument("profile", 2));
                // After setting profile level to 2, I find there is no filter or pipleline transfered to mongodb.
                // I think it will get all records from mongodb and filter in the memory.
                var employee = zooDbContext.Employees
                                           .FirstOrDefault(e => e.FirstName == $"{DateTime.Now.Ticks}");
            })));
        }
        await Task.WhenAll(tasks);
    }

`

IvanZheng commented 5 years ago

I logged mongodb profiler data when dbContext.Users.FirstOrDefault(u => u.Name == name) executed and there were two operations sent to the mongodb. From the logs, we can see there was no filter or pipeline transfered and nreturned values were 9899(op2) + 101(op1) = 10000 which was just the total number of the records in the mongodb document.

`

{
    "op": "getmore",
    "ns": "DemoDb.users",
    "command": {
        "getMore": NumberLong(8853374825272013602),
        "collection": "users",
        "$db": "DemoDb",
        "lsid": {
            "id": BinData(4,
            "AaUWNUs+RB+UP4vnAGYZkA==")
        }
    },
    "originatingCommand": {
        "aggregate": "users",
        "pipeline": [],
        "cursor": {},
        "$db": "DemoDb",
        "lsid": {
            "id": BinData(4,
            "AaUWNUs+RB+UP4vnAGYZkA==")
        }
    },
    "cursorid": NumberLong(8853374825272013602),
    "keysExamined": 0.0,
    "docsExamined": 1526.0,
    "cursorExhausted": true,
    "numYield": 12.0,
    "nreturned": 9899.0,
    "locks": {
        "Global": {
            "acquireCount": {
                "r": NumberLong(15)
            }
        },
        "Database": {
            "acquireCount": {
                "r": NumberLong(14)
            }
        },
        "Collection": {
            "acquireCount": {
                "r": NumberLong(13)
            }
        }
    },
    "responseLength": 1365052.0,
    "protocol": "op_msg",
    "millis": 14.0,
    "planSummary": "COLLSCAN",
    "ts": ISODate("2018-12-08T10:18:34.078+0000"),
    "client": "10.100.250.175",
    "allUsers": [],
    "user": ""
}
{
    "op": "command",
    "ns": "DemoDb.users",
    "command": {
        "aggregate": "users",
        "pipeline": [],
        "cursor": {},
        "$db": "DemoDb",
        "lsid": {
            "id": BinData(4,
            "AaUWNUs+RB+UP4vnAGYZkA==")
        }
    },
    "cursorid": NumberLong(8853374825272013602),
    "keysExamined": 0.0,
    "docsExamined": 8474.0,
    "numYield": 66.0,
    "nreturned": 101.0,
    "locks": {
        "Global": {
            "acquireCount": {
                "r": NumberLong(68)
            }
        },
        "Database": {
            "acquireCount": {
                "r": NumberLong(68)
            }
        },
        "Collection": {
            "acquireCount": {
                "r": NumberLong(68)
            }
        }
    },
    "responseLength": 13828.0,
    "protocol": "op_msg",
    "millis": 21.0,
    "planSummary": "COLLSCAN",
    "ts": ISODate("2018-12-08T10:18:34.051+0000"),
    "client": "10.100.250.175",
    "allUsers": [],
    "user": ""
}

`

IvanZheng commented 5 years ago

Hi @crhairr, any progress on this? Could you please tell me where should I look into? I want to resolve the issue ASAP, otherwise I have to re-implement my query API for my project.

BTW, in MongoDbDatabase.SaveChangesAsync, Task.WhenAll() missed the parameter tasks. It would cause indefinite waiting for the async processing when high concurrency requests arrived.

crhairr commented 5 years ago

I apologize, between the holidays, work, and family illness over had a hard time getting free time to put into this. I'll set aside some time tonight for it.

As for your last comment, I see that pretty clearly. That's a painful omission, and might very well be at the root of a lot of these TPL issues. I'll fix it and run some additional tests tonight, definitely get a build out for at least that part.

IvanZheng commented 5 years ago

@crhairr, firstly, thank you for your efforts! After some investigaton, I think I have found the root cause of the WhereClause missing in MongoDbQuery. In EFCore "LinqOperatorProvider._Where", it uses "System.Linq.Enumerable.Where" not the "MongoDB.Driver.Linq.MongoQueryable.Where". Maybe we have to override the LinqOperatorProvider to make it use "MongoDB.Driver.Linq". Hope it's helpful to you. `

  public class LinqOperatorProvider: ILinqOperatorProvider
  {
       ...
        [UsedImplicitly]
        // ReSharper disable once InconsistentNaming
        private static IEnumerable<TSource> _Where<TSource>(
        IEnumerable<TSource> source, Func<TSource, bool> predicate) => source.Where(predicate);
       ...
   }

`

crhairr commented 5 years ago

If you check out the MongoDB C# driver's GitHub repo, it binds both Enumerable.Where and Queryable.Where:

https://github.com/mongodb/mongo-csharp-driver/blob/14e046f23640ff9257c4edf53065b9a6768254d4/src/MongoDB.Driver/Linq/Processors/EmbeddedPipeline/MethodCallBinders/WhereBinder.cs

If you are still working with your changes to MongoDBEntityQueryModelVisitor.ReplaceClauseReferences, that is probably causing the filter to be shifted down to after .AsAsyncEnumerable() is executed. That is, you're pulling in every single document from the DB collection and filtering in-memory after the fact.

IvanZheng commented 5 years ago

Sorry,I forgot to clairfy that I had reverted my changes about the ReplaceClauseReference and still can reproduce it.

I think that the mongodb driver binding both Enumerable.Where and Queryable.Where is not related to what I said. I mean that "source.Where(predicate)" in LinqOperatorProvider uses the System.Linq.Enumerable while not the MongoDB.Driver.Linq.MongoQueryable so that mongodb driver has no chance to append the filter to MongoQueryable.

crhairr commented 5 years ago

I've added a test that uses multiple concurrent queries against a single Context, but it's currently disabled due to a Skip. I've worked out some of the threading issues, but there's still a problem specifically with ToListAsync(). I'm continuing to work on that.

IvanZheng commented 5 years ago

@crhairr, I open a new issue to trace the missing filter in MongoQueryable. missing filter in MongoQueryable

crhairr commented 5 years ago

This is being addressed as part of #32. The old-style async query compiler apparently leaked some thread context that trickled up when a second or third context is created from the same DI container. I had to override the compilation at a different level to effectively limit the scope.

However, this change only addresses the problem with async query errors propagating out of previously disposed DbContexts. This will not allow multiple concurrent queries against the same DbContext from multiple threads, which is explicitly disallowed by the base DbContext in EF Core as many of its operations are not guaranteed to be thread-safe.