Include & cache entity types to model on-demand basis

jamezamm commented 1 year ago

By convention, types that are exposed in DbSet properties on your context are included in the model as entities. Entity types that are specified in the OnModelCreating method are also included, as are any types that are found by recursively exploring the navigation properties of other discovered entity types. Source

Current Behavior

As stated above, entity types are included within a model by being:

exposed in DbSet properties within a DbContext,
specified in DbContext.OnModelCreating, and
exposed as navigation properties within other entity types that have already been included

All the above occurs when a DbContext is first queried and this is done only once, via model caching. The drawback of such an operation is a slow first query. This operation will be even slower if a greater number of entity types are included within a model.

One solution that have been suggested more than once, is to split a large DbContext into multiple smaller DbContexts to reduce the slow startup. Unfortunately, this is not possible for all scenarios.

Such a scenario would be when entity types in a DbContext are heavily linked, via navigation properties, with entity types of other DbContexts. This would still result in a slow first query, as numerous entity types of other DbContexts would be included within the current model.

Proposed Behavior

Instead of including all entity types to a model on the very first query, entity types should be included to a model on-demand basis. Therefore, only entity types which have been accessed would be included in the model, thus greatly reducing the startup delay.

The below example would include & cache the Person and Job entity types to the model before querying db. Person p = _context.Persons.Include(x => x.Jobs).SingleOrDefault(x => x.Id == "1"); // Includes Person and Job Entity Types to Model

The below example would include & cache the Company entity type to the model before querying db Company c = p.Company; // Includes Company Entity Type to Model

The below example would only query Persons as the Person entity type had already been included and cached to the model. Person p = _context.Persons.SingleOrDefault(x => x.Id == "12"); // Query Persons only

As a suggestion, this feature would be activated by providing an option, in DbContextOptionsBuilder, that would be set in DbContext.OnConfiguring.

roji commented 1 year ago

@jamezamm there are various model-building activities (conventions) which need to see the entire model with all the entities, and run only when the model is finalized. The sort of "incremental" building of the model which your proposing doesn't really work.

However, we already have some very good solutions to make your application start up faster. Specifically, are you using compiled models? If so, can you provide some actual numbers (how long does the startup take, how many entities do you have)?

Note that it's also possible to separate the initial model building from the first query; just accessing the Model property on a DbContext instance should trigger this. This allows you to perform startup work before your first query. This can also allow you to measure model-building specifically in your benchmarking, without involving any query compilation activities, which are pretty heavy as well.

jamezamm commented 1 year ago

@roji Thanks for the reply.

there are various model-building activities (conventions) which need to see the entire model with all the entities, and run only when the model is finalized.

Please elaborate further.

If so, can you provide some actual numbers (how long does the startup take, how many entities do you have)?

More or less I have a thousand entity types and the startup takes approx. 12 seconds.

Specifically, are you using compiled models?

I had come across compiled models but was discouraged from using them because of their limitations.

An "incremental" building of the model would be a more elegant and faster approach, than having to manually generate a compiled model every time the model definition changes.

The sort of "incremental" building of the model which your proposing doesn't really work.

I am well aware that such a feature would have its limitations. Even tough compiled models had their limitations, they were nonetheless implemented.

roji commented 1 year ago

I had come across compiled models but was discouraged from using them because of their limitations.

Which limitations specifically? We're looking into removing various limitations to allow more users to use them, so any info here would be useful.

ajcvickers commented 1 year ago

@jamezamm We have done considerable thinking around this approach in the past. The two major issues, as I remember them, are:

The model and mapping for a given entity type can change as further parts of the model are discovered. This is especially problematic with uni-directional relationships, where some other entity type is related to the current type, but the current type does not itself have any navigation to that other type. This means that we cannot really know the mapping for a given type without knowing the mapping for all types related to it, and this ultimately requires full discovery of the model to resolve. Other constructs that have similar implications include inheritance and entity/table splitting.
The current model is shared by all context instances can be safely used from multiple threads without any kind of locking. This is because it is immutable. While it would be possible to make the model both mutable and thread-safe, this would add considerable complexity and has its own performance implications.

gouderadrian commented 1 year ago

I do understand @ajcvickers but also @jamezamm . At my place of work, an in-house software development section, we built and maintain a domain wide model that includes all the entities involved, as a single DLL. This makes our work much easier. Ease of navigating from one entity to others is paramount. I can have something like MyPerson.LastDeployment.TruckAssigned.LastFault.Engineer.Manager. This is just an example but goes to show that if I have roughly 800 entities (several levels deep), it is useless splitting them up into separate DBContexts because through navigation, almost every single entity is ultimately reachable from almost any other entity. The time it takes to build the model is not an issue for our web-portal, for example, as the model-build will only take place once every time the web-site is reset/recompiled for any reason. But using the model in desktop applications is already a tedius problem as users have to wait for 8 seconds or so (on a fast computer) before they can execute their first query. During testing and development this is an even bigger issue because as a developer, I have to wait for over 8 seconds each time I change something that has to do with the domain model and test it - which is very frequent. With regards to compiled models, this is not really a great solution for us, from the point of view that we continuously add or modify entities in our line of work, and having to pre-compile the model each time (or just wait for the model to build) in order to test, is not efficient. Please do keep in mind that as stated above, we are an in-house software development section, and employ Rapid Application Development through Iterative Refinement. Some applications need to change quite a few times until 'we get it right', simply because our client (the company) often does not have perfect clarity on what is required until they have hands on experience on the prototype.

jamezamm commented 1 year ago

@roji I did mention that one needs to manually generate a compiled model every time the model definition changes and I would still have to wait 12 seconds or less for the compilation to complete.

jamezamm commented 1 year ago

@ajcvickers Thanks for the reply.

I understand, but I still do hope that such a feature would be considered in the future.

guillaume86 commented 1 year ago

So is there any hope for people with big models? The most disappointing thing about all this is that Linq2SQL used to handle this use case fine.

ajcvickers commented 1 year ago

@guillaume86 What version of EF Core are you using? How big is your model? What startup times are you seeing? Are you using compiled models? What times are you seeing with LINQ to SQL?

guillaume86 commented 1 year ago

@ajcvickers Hi, here's what I found trying again today:

What version of EF Core are you using? 7.0.4

How big is your model? I have 542 entities in my Linq2SQL project, 654 in the EF Core project because I can't filter tables without hitting the command line length limit but I don't think filtering will improve things that much.

What startup times are you seeing? I'm seeing 55+ seconds time to first query in EF Core. 8 secs with a compiled model (which itself takes +1 min to generate).

Are you using compiled models? I can but the DX is not great since our model changes a lot and waiting +1min when the model changes is a painful regression in our workflow. Maybe I could build a custom scaffolding tool to directly generate Entities and the compiled model from the DB at once, I guess I can skip generating the mapping attributes and the OnModelCreating fluent mappings in that case? Anyway, it improves things a lot but the time to first query remains disappointing and basically unusable for something like a cli tool.

What times are you seeing with LINQ to SQL? < 1 sec (700-900 ms)

dotnet / efcore

Include & cache entity types to model on-demand basis #30166

Current Behavior

Proposed Behavior