Reduce EF Core application startup time via compiled models

mikary commented 9 years ago

This an "epic" issue for the theme of generating a compiled model with fast loading and access times. Specific pieces of work will be tracked by linked issues.

Proposed for 6.0

Used by calling new tool commands (all parameters are optional)

dotnet ef dbcontext optimize -c MyContext -o MyFolder -n My.Namespace

Optimize-DbContext -Context MyContext -OutputDir MyFolder -Namespace My.Namespace

[x] Add runtime annotation support to model #22031
[x] Convert metadata extension methods to default interface implementations. #19213
[x] Create a read-optimized implementation of IModel that can be used as the base for the compiled model #8258
[x] Add API to set custom implementation types instead of objects
- [x] ValueGenerator (store non-lambda configuration separately)
- [x] ValueComparer and ValueConverter
[x] Implement a generator that outputs the source code for a custom model implementation
- [x] Throw when the model contains non-serializable configuration like lambdas, proxy types and non-serializable expressions. Or if it's using a read-optimized implementation.
- [x] Generate #nullable enable
- [x] Warn when using a model generated by an older version.
- [x] Warn when using a non-default model cache key.

Backlog

[x] Consider discovering the model automatically instead of requiring 'UseModel', by using assembly-level attribute (#24893)
[x] Consider adding a MsBuild task to generate the model at build-time (#24894)
[ ] Either move the members from the pubternal model interfaces to the public ones, so that they can have non-dynamic implementations or add a runtime annotation-based backup implementation. (#24895)
[x] Generate compiled relational model (#24896)
- [ ] Add tests that assert the relational model, view, query mapping and default mappings
[ ] Generate code for query filters when possible. (#24897)
[ ] Generate constructor bindings (#24898)
[ ] Generate code that builds the model lazily (#24899)
[ ] Rely on static binding instead of reflection for properties and fields (#24900)
[ ] Split out the mutable model implementation to a different assembly as it's not needed when using compiled model (#24901)
[x] Generate custom lazy loading and change tracking proxy types. (#24902)
[ ] Generate lambdas used in change tracking (#24904)
[ ] Generate code that normally depends on reflection types in the model (Type, MemberInfo, etc.) to avoid all reflection at runtime (#24903)

lemkepf commented 8 years ago

Any updates on if this will be added into EF Core any time soon?

TonyHenrique commented 7 years ago

Any updates on this? I am having problem on Cold Start up time, specially on a large model...

ajcvickers commented 7 years ago

@TonyHenrique Any chance we could get access to your large model for perf testing? You can send it privately (to avickers at my work) if you don't want to post here.

It's unlikely we will get to this in the near future--other work items have higher priority right now.

RainingNight commented 7 years ago

Now what progress?

enghch commented 7 years ago

Just want to add my experience. I'm using EFCore 1.1.3 on UWP with SQLite. I'm seeing almost 7 seconds worth of DB startup time. Two seconds to create the DbContext, about 4 seconds to check for a migration (no migrations are needed or performed) and a second or so to actually perform the first query. The actual database is essentially empty of any rows. The database has three interrelated entities and a total of about 25 columns. I'm guessing a large portion of the migration check is actually building the model under the hood.

It's a big hit because the user can't use the app until the data is loaded. Using Json files the entire app is up, loaded and running within about 1.5 seconds. When I add EFCore/SQLite to the mix it's suddenly around 9 seconds.

ajcvickers commented 7 years ago

@enghch Is this a release build (i.e. using .NET Native) or a debug build? Does it get significantly slower in release than debug?

enghch commented 7 years ago

Good point. That was with a debug build while debugging. Running a debug build without debugging seems to take it down to about 2-3 seconds. Running a release build seems to be about 1.3-2 seconds. I still wish it were faster (say, by using some kind of model caching) but I think I can make these times work.

MarcoLoetscher commented 7 years ago

This problem is also present in EF6 and unfortunately it was never solved. Unlike EF6, EF Core is also used for apps (UWP, Xamarin). Users do not accept long waits at the start of an app. In my opinion, the solution to this problem is much more important than e. g."Lazy Loading", so this point should be planned for the next release 2.1. The cold start of a model with 500 entities should take a maximum of 5 seconds and not 180 seconds (3 minutes). So you would have to achieve a 36x improvement, which is probably not possible by compiling the model. So I guess there will be nothing else left but a kind of lazy model creating. We don't use EF Core yet because we lack some features (Many-to-many without join entity, Spatial data, Graphical visualization of model, GroupBy translation, Raw SQL queries: Non-model types).

ngoov commented 7 years ago

Can I vote somewhere to make this a higher priority in the roadmap? I had to delete alot of the models and slim down the DbContext because of this. Still the 20 models take about 15-20 seconds to generate. This slows down development a lot!

werwolfby commented 6 years ago

Any updates?

tobiasbunyan commented 6 years ago

Never mind the load time on a live application, the crippling amount of time you have to wait each time you start a debug session slashes productivity to almost nothing!!! This needs sorting out BIG TIME!

GeorgeThackrayWT commented 6 years ago

Any news on this? this seems to be quite a serious problem.

ajcvickers commented 6 years ago

This issue is in the Backlog milestone. This means that it is not going to happen for the 2.1 release. We will re-assess the backlog following the 2.1 release and consider this item at that time. However, keep in mind that there are many other high priority features with which it will be competing for resources.

mwe commented 6 years ago

I also think this is a problem in a happy development workflow. In a test driven environment our developers need to wait 2 minutes for the DbContext to initialize in order to (re)run the test. This is a lot of waiting. Also our test systems where the apppool shuts down takes a long time to have a cold startup where at least 3-4 minutes are caused by EF (EF6 has indeed the same problem, but a cached model improves this EF6.2)

It seems that there is at least one problem in EFCore\Storage\TypeMappingSource.cs -> FindMappingWithConversion, when we run the application with sources, this function causes 60% of the CPU. It seems that it is called many times and very slow. Perhaps the problem is with the ConcurrentDictionary here that is getting too many calls.

Please find attached a screen of a debug session. ef-core-github-1906

If more information is required, i will be happy to supply that.

gojanpaolo commented 6 years ago

I just wanted to share a workaround if you're using xUnit and in-memory testing. You might notice that if you have multiple test classes that uses a DbContext then each test class will have the overhead of model creation. You can apply this workaround so that the model creation will be done only once per test run and not once per test class.

/// <summary>
/// This is the base class for all test classes that will use <see cref="YourDbContext"/> SQLite in-memory testing.
/// </summary>
/// <remarks>
/// This uses the <see cref="DbContextCacheCollectionFixture"/> as a workaround to improve unit test performance.
/// The deriving class will automatically inherit this workaround.
/// </remarks>
[Collection(nameof(DbContextCacheCollectionFixture))]
public class DatabaseTestBase : IDisposable
{
    private readonly SqliteConnection _connection;
    protected readonly DbContextOptions<YourDbContext> _options;
    public DatabaseTestBase()
    {
        _connection = new SqliteConnection("DataSource=:memory:");
        _connection.Open();
        _options = new DbContextOptionsBuilder<YourDbContext>()
            .UseSqlite(_connection)
            .Options;
        using(var context = new YourDbContext(_options))
        {
            context.Database.EnsureCreated();
        }
    }
    public void Dispose()
    {
        _connection.Close();
    }

    [CollectionDefinition(nameof(DbContextCacheCollectionFixture))]
    private class DbContextCacheCollectionFixture : ICollectionFixture<object>
    {
        /// This is a workaround to improve unit test performance for test classes that use a <see cref="DbContext"/> object.
        /// <see cref="DbContext"/> model creation has a significant overhead.
        ///     https://github.com/aspnet/EntityFrameworkCore/issues/4372
        ///     https://github.com/aspnet/EntityFrameworkCore/issues/1906
        /// By applying this attribute across all the <see cref="DbContext"/> test classes, the model creation is only done once throughout the test run instead of once per test class.
    }
}

ajcvickers commented 6 years ago

@gojanpaolo Models are cached by default; if you are seeing otherwise then can you please file a new issue with a runnable project/solution or code listing that demonstrates this behavior?

mwe commented 6 years ago

I need to clarify the unit test example, in a tdd workflow, the test is executed by a developer many times while he is writing the code. This means that after every try, the DbContext needs to be re-initialized. When the test failed, appdomain is unloaded and on the next run a new DbContext is initialized and cached.

Anyhow, this issue is about slow initialisation of a large DbContext (300 entities). The unit test issue is only one usecase where this is a problem. please don't focus on that.

Aftere more research i discovered that EFCore\Storage\TypeMappingSource\FindMappingWithConversion is called 1.2 million times. In our situation this means that during initialisation the dictionary with typemappings (77000) items is scanned 1.2 million times. Our DbContext has 300 entities with an average of 16 properties per entity. It seems that the FindMappingWithConversion is executed way to many times. This week i will investigate further and try to understand the loops that are being executed here and why this is slow.

OnModelCreating takes 1.5 minutes in Released dll's and 10minutes in debug (with EF projects as project references) with performance tracing switched on.

UPDATE: Cause of the many calls seems to be in efcore\metadata\conventions\internal\foreignkeyattributeconvention.cs

when i change the logic here to first check if the attrribute is there and when the attribute is found than run FindCandidateNavigationPropertyType my context initialises 5 times faster and the dictionary with typemappings is queried 20.000 times instead of 1.2 million times.

        /// src\efcore\metadata\conventions\internal\foreignkeyattributeconvention.cs
        [ContractAnnotation("navigationName:null => null")]
        private MemberInfo FindForeignKeyAttributeOnProperty(EntityType entityType, string navigationName)
        {
            if (string.IsNullOrWhiteSpace(navigationName)
                || !entityType.HasClrType())
            {
                return null;
            }

            MemberInfo candidateProperty = null;
            var clrType = entityType.ClrType;

            foreach (var memberInfo in clrType.GetRuntimeProperties().Cast<MemberInfo>()
                .Concat(clrType.GetRuntimeFields()))
            {
                // GITHUB: ISSUE: 1906
                var attribute = memberInfo.GetCustomAttribute<ForeignKeyAttribute>(true);

                // first check if we have the attribute
                if (attribute != null && attribute.Name == navigationName)
                {
                    // than run the FindCandidateNavigationPropertyType as this function seems to be expensive
                    if (memberInfo is PropertyInfo propertyInfo
                        && FindCandidateNavigationPropertyType(propertyInfo) != null)
                    {
                        continue;
                    }

                    if (candidateProperty != null)
                    {
                        throw new InvalidOperationException(CoreStrings.CompositeFkOnProperty(navigationName, entityType.DisplayName()));
                    }

                    candidateProperty = memberInfo;
                }
            }

ajcvickers commented 6 years ago

@mwe Thanks for the info. @AndriySvyryd is looking at model building perf as part of #11196. He and I talked and we filed #11358 which should also make this better.

StrangeWill commented 6 years ago

This would be a huge boost for a project I'm working with, we have a context that is over 4MB of C# code (programatically generated from an in-use database today), boot time is over 40000ms, this is silly when our tests boot up (but we'll live for now).

AndriySvyryd commented 6 years ago

At the current rate it's unlikely that anyone from the EF team would get time to work on this before 2020. But we can provide guidance for anyone who is willing to tackle this.

kierenj commented 6 years ago

I’m curious to see if this is something I might be able to understand/tackle. (To be clear: I’m certain there are tough challenges and good reasons!). Right now I’m really unclear on how even many hundreds of entities would require any significant time at all to initialise. @AndriySvyryd would a quick tour of the appropriate area of code, top level thoughts on a plan and potential pitfalls be something that mightnt take too long to put together?

AndriySvyryd commented 6 years ago

In EF Core model building is done in a reactive manner using configuration conventions. They are basically event handlers that apply some rule to configure an aspect of the model in response to some change in the model. For example when an entity type is added the RelationshipDiscoveryConvention will use reflection to find navigation properties and configure them. See CoreConventionSetBuilder.cs where most of the conventions are registered. Each additional entity could result in hundreds of convention calls and since many of them involve reflection this quickly adds up. There are already other open issues tracking making the conventions faster and reducing the number of invocations. I'll post some initial thoughts on the compiled model a bit later.

kierenj commented 6 years ago

Ok, great thanks. In terms of a 'compiled model' my initial thought was that this was some kind of serialised cache of the result of the process you mention there, since that's what I've encountered w/EF6 as a solution to this. Now I wonder if we're talking about code generation, much like the model snapshot? Either way I'm keen to see if I can help with the challenge.

roji commented 6 years ago

I wonder if it's not a good idea to carefully profile the current model building process before jumping ahead to explore saving a compiled representation. Maybe some optimizations can reduce the normal process to something manageable.

AndriySvyryd commented 6 years ago

@roji Yes, we profile model building before each release and fix the worst parts. The more risky improvements have been scheduled for 3.0.0:

https://github.com/aspnet/EntityFrameworkCore/issues/7850
https://github.com/aspnet/EntityFrameworkCore/issues/214:

Don't store delayed convention executions if no conventions are registered When executing delayed conventions prune redundant ones Use reference counting to avoid scanning the model for unused elements.

But the normal model building process has to use reflection and this imposes a fundamental lower limit on the time it takes and it will be too high for some big model.

AndriySvyryd commented 6 years ago

There are several ways we can avoid running conventions:

Use some kind of serialization to store and then restore the model. There isn't a natural format that the EF Core model can be serialized to, so just defining it would be significant work. Besides, deserialization could still take a significant amount of time.
Generate code that explicitly configures every aspect of the model, similarly to the migrations model snapshot, see https://github.com/aspnet/EntityFrameworkCore/issues/12511. Since most of the required code already exists this should be relatively easy to implement, but the result could still be too slow for very big models.
Generate a custom implementation of IModel and related interfaces with hardcoded values specific to the input model. Each IEntityType and IProperty could actually be implemented by different classes. This approach is what this issue is about. Not only the initialization time would scale well with the model size, but the bespoke implementation will also improve the runtime perf for change tracking, updates, queries and anything else that uses the model. The work could be separated in the following manner:
1. Create a sample model implementation like the prototype removed in https://github.com/aspnet/EntityFrameworkCore/commit/4a432ad7473688d6ff75566daa6d1efd40e03fb0. This would serve to find the best model implementation, test that the runtime works correctly with it and serve as the target to test the code generator against.
2. Fix places in EF that assume a specific implementation. We frequently cache some derived values in the model, see https://github.com/aspnet/EntityFrameworkCore/blob/master/src/EFCore/Metadata/Internal/EntityTypeExtensions.cs#L352. These should be moved to the interfaces with a default implementation, see https://github.com/aspnet/EntityFrameworkCore/issues/19213. This part is also required for https://github.com/aspnet/EntityFrameworkCore/issues/8258
3. Create a generator that would produce the code from i. Preferably it would use Roslyn to do so, but this is not set in stone.
4. Some configuration can't be serialized (e.g. lambda ValueConverters) and will need to be specified in a way that allows serialization. Non-serializable values configured by conventions could still be incorporated by converting them to default value conventions
5. (Optional) Split out the mutable model implementation to a different assembly as it's not needed when using compiled model. Migrations might still needed it, depending on https://github.com/aspnet/EntityFrameworkCore/issues/18620
6. (Optional) Generate lazy loading and change tracking proxy types.

roji commented 6 years ago

Thanks for that overview @AndriySvyryd, it all makes sense.

rezabay commented 5 years ago

This is a very important feature that seriously affects containerized applications. It would be nice if you could include it in EF core 3.0 roadmap.

Rotdev commented 5 years ago

This is really slowing down our development effort. We take from 40-70 seconds to open up the project. Very annoying and detrimenting for development effort. Our idea for fixing this on our own would be to mock the datasources, this would recquire up to 40 development hours for something that should just work :-( Can we upvote this fix somewhere?

StrangeWill commented 5 years ago

https://github.com/aspnet/EntityFrameworkCore/issues/1906#issuecomment-401506938

Heads up for what I did to get around this for now: we're working with an old database and integrating our software with it -- we're only importing the tables we need for now which has made it livable, but this will only be workable for so long. You can easily provide a list of tables to the Scaffold-DbContext process and we just keep the list in the project's readme for now (and add to it as needed).

We are talking about some 2000+ tables too, which is insane but blame that on Microsoft's ERPs.

starquake commented 5 years ago

We are talking about some 2000+ tables too, which is insane but blame that on Microsoft's ERPs.

I don't understand why people think lots of tables are bad.

I think lots of tables are fine. Exactly why we need compiled models.

StrangeWill commented 5 years ago

@starquake it's like a 2000+ line method, sure it isn't always bad but sets off a lot of code smells that something is up. In the case of Dynamics Nav it's filled with cumbersome design issues and a half-baked API, every configured business increases your table count by N * 1400+ and the thing crawls on top of having a ton of constraints that a modern application database shouldn't have.

Sure you can have a ton of tables and be completely valid in the design but I'd argue those are the exception not the rule when I stumble across projects with tables that SSMS chugs to load.

ajcvickers commented 5 years ago

@Rotdev You can upvote the issue by giving it :+1:

espray commented 5 years ago

@ajcvickers Sorry, but how will upvoting help? This issue is almost 4 YEARS old and has been passed over several time for items with fewer votes. How many votes are needed, for an item will be included into a release?

rezabay commented 5 years ago

Since EF 6 features will be able in .Net Core 3.0 is it possible to use Pre-generated views to solve you problem?

ajcvickers commented 5 years ago

@espray It is one of the inputs into the release planning process. That being said, we have a small team and a lot of requested features, so the realistic expectation should be that some things won't be implemented for years.

ajcvickers commented 5 years ago

@rezabayesteh EF6 and EF Core are entirely different code bases. The one does not impact the other.

jamesmeneghello commented 5 years ago

Yeah, just reiterating that this is pretty bad for containerised apps, especially autoscaling k8s apps. I guess after reading this thread I consider myself lucky that our context only takes 15 seconds to build, but 15 seconds is a long time to wait for the first request to hit a pod.

Gokaysim commented 5 years ago

Generate a custom implementation of IModel and related interfaces with hardcoded values specific to the input model. Each IEntityType and IProperty could actually be implemented by different classes. This approach is what this issue is about. Not only the initialization time would scale well with the model size, but the bespoke implementation will also improve the runtime perf for change tracking, updates, queries and anything else that uses the model. The work could be separated in the following manner:

Create a sample model implementation like the prototype removed in 4a432ad. This would serve to find the best model implementation, test that the runtime works correctly with it and serve as the target to test the code generator against.

Fix places in EF that assume a specific implementation. We frequently cache some derived values in the model, see https://github.com/aspnet/EntityFrameworkCore/blob/master/src/EFCore/Metadata/Internal/EntityTypeExtensions.cs#L352. These should be extracted to separate interfaces that could be implemented by the model and provide a fallback implementation in case they are not. This part is also required for #8258

Create a generator that would produce the code from i. Preferably it would use Roslyn to do so, but this is not set in stone.

I have implemented your approach except for the second step. I do not want to change the source code of EF Core. I have look for precedence between extensions but could not get anywhere. Do you have any suggestion for the code below or to implement the second part?

https://github.com/Gokaysim/EntityCompiledModelGenerator/tree/master

AndriySvyryd commented 5 years ago

@Gokaysim You are on the right track, I suppose that you didn't start optimizing the generated code yet (like using a lookup for FindEntityType).

For the second point you could take the pragmatic approach and just try using the generated model, getting an exception and changing the code that throws to calculate the values at runtime if it's not the expected implementation.

After all those places are fixed you could take a second pass and actually calculate the required values when generating the compiled model and change the extensions to cast to a new interface instead of a concrete type (e.g. EntityTypeExtensions.GetCounts() would cast to IPropertyCountSource instead of EntityType)

Alternatively you could wait for https://github.com/aspnet/EntityFrameworkCore/issues/8258 to be implemented and it would take care of these, hopefully early next year.

Gokaysim commented 5 years ago

@AndriySvyryd I think using interfaces was not a good idea. Instead of the interface, classes which inherit Model, EntityType and Property type will solve the problem. Virtual props and function of those classes will be implemented in derived classes. I think it will solve the casting problem. I will implement in this way.

For the second point, calculating values at runtime will slow down the app. instead of that before compiling app calculating and generating its static codes saves CPU time. But I am not sure I get what you meant.

AndriySvyryd commented 5 years ago

@Gokaysim Compiled model classes shouldn't inherit from the mutable implementation (Model, EntityType, Property, etc.) as it's optimized for model building and has many fields that are used only during the build process. Compiled model should be read-only and optimized for reading performance and using less memory to improve startup time.

I am proposing calculating the values at runtime only as the interim/fallback solution to make the compiled model work, but afterwards it can be optimized by precalculating the values.

ilmax commented 4 years ago

@AndriySvyryd I was playing with some T4 template to generate a compiled model as suggested in you issue, and I got something very basic, but I've found a roadblock here Essentially some usages of the IModel interface are not respecting the abstraction and are coded against the concrete Model implementation, what shall we do here? We can promote some more methods to the IModel interface this may be a breaking change but actually I think it shouldn't break anyone or we can move some of the logic that belongs to the Model class to another and change the IModel access to go through this new indirection

roji commented 4 years ago

For anyone looking at this, it seems like the upcoming C# 9 source generators feature could be a perfect fit for this - rather than doing something with T4.

ilmax commented 4 years ago

@roji T4 are fine 😄 (kidding) indeed source generators is a nice option, I was just asking what to do in order to get rid of the roadblocks before actually looking into the proper way to generate a compiled model. The sooner we can enable people to use a custom IModel the better. also generating a compiled model with T4 may be (IMHO) an "acceptable workaround" until a more fancy solution is found.

ajcvickers commented 4 years ago

@ilmax That behavior on the interfaces is currently by-design. I could write a lot more about that, but not in the time I have available now. This issue has a lot of history around it and five years of discussion in the team. As much as we encourage community contributions, this isn't the issue to work on. It's an involved and complex change with many considerations.

AndriySvyryd commented 4 years ago

Essentially some usages of the IModel interface are not respecting the abstraction and are coded against the concrete Model implementation, what shall we do here?

@ilmax That's what I was refering to in point ii. As @ajcvickers said it would be a rather big change to fix all of these, so if you are willing to commit a large portion of your time let us know so we can discuss and come up with a concrete plan on how to proceed.

ilmax commented 4 years ago

@ajcvickers ok, got that I was just curious to measure a couple things like the time it takes to create a context and the time it takes to run a query with a compiled model so it was mostly an investigation following this comment. Will try to spend some time on #11597 instead.

Edit: I still think though it would be nice to at least have the ability to swap to a compiled model without having the feature to generate a compiled model available now, that part can be a community project at first, and it would allow us to know what's the impact of using a compiled model in order to prioritize the feature properly (e.g. if the gain is negligible vs a considerable perf gain)

Just my two cents

elksson commented 4 years ago

Startup times on EF Core is causing slow response times in azure function HTTP Requests where a response time needs to be always sub second. Is there any way to speed up the load time or at least pre-compile all the models during application startup so that by the time the HTTP Request is made the model is 100% ready.

jespersh commented 4 years ago

@elksson I'm thinking your azure function is compiling a lot more models than it actually needs to perform its jobs?

dotnet / efcore

Reduce EF Core application startup time via compiled models #1906

Proposed for 6.0

Backlog