dotnet / EntityFramework.Docs

Documentation for Entity Framework Core and Entity Framework 6
https://docs.microsoft.com/ef/
Creative Commons Attribution 4.0 International
1.63k stars 1.96k forks source link

Document deployment and seeding patterns #736

Open Silthus opened 6 years ago

Silthus commented 6 years ago

Updated to reflect the core issue we are tracking here:

TL;DR: It is very hard to come up with really good guidance on how create or update application state on deployment. Note that this is commonly related to database schema and seed data, but it is not limited to that.

Many customers end up adding logic at application startup because that runs after deployment, but this is very problematic, especially if there is more than one instance of the application using the same database.

We think this ideally should happen as a post-deployment step, not startup.

We would like to do some brainstorming with a wider group with the goal of identifying patterns that we can recommend today and/or functionality that we can build into the publish components to make this work.

Original issue as reported:

How can the key of an entity be specified? I have an entity that should be seeded based on its name and the ID (Guid) is generated automatically. Something like this:

modelBuilder.Entity<Blog>().HasData(blog => blog.Name, new Blog {Name = "New Blog", Url = "http://sample.com"});

And for composite keys:

modelBuilder.Entity<Blog>().HasData(blog => new {blog.Name, blog.Url}, new Blog {Name = "New Blog", Url = "http://sample.com"});

Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

martinpetrovaj commented 6 years ago

Similar problem here. I was trying to seed the DB with a tree-like list of shop categories without knowing their IDs prior to seed (I'm relying on the DB to generate the ID identity key).

builder.Entity<Category>().HasData
            (
                new Category()
                {
                    Title = "Living room",
                    Url = "living-room",
                    OrderNo = 1,
                    Hidden = false,
                    ChildCategories = new List<Category>()
                    {
                        new Category() { Title = "Carpets", Url = "carpets", OrderNo = 2, Hidden = false },
                        new Category() { Title = "Flower pots", Url = "flower-pots", OrderNo = 3, Hidden = false },
                    }
                },
                new Category()
                {
                    Title = "Kitchen",
                    Url = "kuchyne",
                    OrderNo = 4,
                    Hidden = false,
                    ChildCategories = new List<Category>()
                    {
                        new Category() { Title = "Dishes", Url = "dishes", OrderNo = 5, Hidden = false },
                        new Category() { Title = "Kitchen desks", Url = "kitchen-desks", OrderNo = 6, Hidden = false }
                    }
                }
            );

Seeding the EF Core the old way (in Program.cs or similar place using context.Categories.Add) works fine using the same code. But I hoped that we would finally be able to avoid such ugliness with HasData.

AndriySvyryd commented 6 years ago

@Silthus @martinpetrovaj The key values need to be specified explicitly as we need to make sure that they stay the same across migrations and servers.

modelBuilder.Entity<Blog>().HasData(
    new {Name = "New Blog", Url = "http://sample.com", ID = new Guid("{8b4e4608-35da-4f7e-8f75-52d32bade14a}")});
cephaspad commented 6 years ago

How can I seed IdentityUser data, esp. password without using UserManager.CreateAsync

AndriySvyryd commented 6 years ago

@cephaspad I think UserManager.CreateAsync normalizes the provided data. So you could call UserManager.CreateAsync on a test database, then examine the raw data inserted and use the same values in HasData

ajcvickers commented 6 years ago

@AndriySvyryd @divega @HaoK Does this go against guidance from Identity where I have heard that manipulating entities not through the manager is...er...frowned upon?

HaoK commented 6 years ago

Yes its definitely frowned upon as the user might have fields that aren't initialized properly, like allowLockoutForNewUsers etc etc.

AlissonRS commented 6 years ago

There are situations we don't actually care if the key values change. I think using the same key values, different ones or allowing the database to generate them should be up to the developer to decide.

AndriySvyryd commented 6 years ago

@AlissonRS The keys are required to be able to detect changes made to the seed data when a migration is generated.

AlissonRS commented 6 years ago

@AndriySvyryd I understand your point, usually we would use some other properties (e.g the user name) as a key value in order to check if that entity already exists in the database, and change the other properties based on the anonymous object, but we wouldn't be able to change the user name (changing it would make EF assume we are seeding a new user).

Specifying the keys makes it possible to change all properties (including the user name). But I still think it should be up to the developer to decide what key values to use (whether the actual keys or other properties like user name).

I had to use workarounds in order to use the UserManager class for seeding data.

ajcvickers commented 6 years ago

@divega @AndriySvyryd I think we need to consider guidance around seeding. Specifically, when new-style model-based seeding should be used, and when it makes more sense to continue to use more traditional seeding approaches because:

/cc @HaoK

Orgbrat commented 6 years ago

As Recommended by the Microsoft EntityFramework.Docs GitHub issues reply, I used the process stated on http://www.binaryintellect.net/articles/5e180dfa-4438-45d8-ac78-c7cc11735791.aspx and it all worked just fine. Thanks for the assistance.

Orgbrat

divega commented 6 years ago

@ajcvickers @AndriySvyryd have we looked at how doing this with HasData looks like? It is really bad to have the keys? Does it not work at all?

Besides that, may main problem with the approach recommended at http://www.binaryintellect.net/articles/5e180dfa-4438-45d8-ac78-c7cc11735791.aspx is that it does it at application startup, which can cause concurrency problems.

benm-eras commented 6 years ago

@Orgbrat would you care to share an example of your solution? The code at the link you provided is for 2.0 and doesn't work as is for 2.1 since the way the web host is build in Program.Main(...) has changed and I am not sure how to get instances of the UserManager and SignInManager

mihaimuraru commented 6 years ago

When unit testing DBContext using the in-memory provider,the database is initialized with data using context.Database.EnsureCreated() with the new model-based seeding with primary keys specified i have the following error when inserting new data in the DBSet already seeded: Message: System.ArgumentException : An item with the same key has already been added. Key: 1

I don't have this error when using an SQL database provider for EFCore - a correct identity primary key is obtained. Does anyone has an idea how this should be handled?

ajcvickers commented 6 years ago

@mihaimuraru Can you please file an issue at https://github.com/aspnet/EntityFrameworkCore/issues including a runnable project/solution or complete code listing that demonstrates the behavior you are seeing?

mihaimuraru commented 6 years ago

I have filed https://github.com/aspnet/EntityFrameworkCore/issues/12371 with a solution that reproduces the behavior.

neil-timmerman commented 6 years ago

I upgraded a .NET Core project from 2.0 to 2.1 in Visual Studio for Mac and this HasData method does not resolve. I verified under Nuget that I have AspNetCore.All and everything under that is 2.1.

ajcvickers commented 6 years ago

@tnk479 If you believe this is a bug, then please file an issue at https://github.com/aspnet/EntityFrameworkCore/issues including a runnable project/solution or complete code listing that demonstrates the behavior you are seeing.

andriysavin commented 6 years ago

Just a raw idea of alternative seeding approach, based on record/replay: in the model creation (or any other suitable step) you get an instance of the (InMemory?)DbContext which you can feed to any API like Identity and call that API as you need (e.g. create users, roles etc). The context records changes as usually and then can convert those changes to some actions in migrations, so they can be replayed on applying those migrations.

IMHO, current support for data seeding will work without friction only in some basic scenarios when you either populate a relatively primitive entities or your application is "data-centric" (meaning your data model is separated from business logic). Nowadays, when DDD use grows, seeding entities as just raw data can be fragile and hard to maintain.

SC1R33RICK commented 6 years ago

Do we have work around on the issue Conversion failed when converting date and/or time from character string?

divega commented 5 years ago

I am going to reopen because, AFAIR, we still have a long way to go in trying to come up with general guidance in this area. We are also working the the owners of the deployment experience trying to identify improvements that could enable general solutions for application state initialization that don’t require running code in startup.

ncarandini commented 5 years ago

@divega any news about it?

nicoleta-scrimint commented 5 years ago

Hi, In some comments, it was stated that seeding data on startup could cause concurrency issues, especially if there is more than one instance of the application using the same database.

When this concurrency could happen? If the asp.net core application is hosted in IIS, a single instance of the startup class is created on the first request, right? Even if two initial concurrent requests are coming, I suppose one is blocked till the startup is created? The number of Startup class instances is given by the number of worker processes per application pool? Thank you!

nicoleta-scrimint commented 5 years ago

By the statement, "We think this ideally should happen as a post-deployment step, not startup.", You refer to have a separate console application which only to execute migrations, seed data, add users and roles with Identity? And this console application to be executed in a post deployment step, after the asp.net core application is copied on the production environment? Thank you!

ajcvickers commented 5 years ago

@nicoleta-scrimint As far as I know, there should not be any concurrency issues as long there is one ASP.NET application running on one server and this is the only application/service connecting to the database.

For the "post-deployment step", a console app would be one way to do this. However, it's usually easier to generate SQL scripts from the migrations and then execute the SQL scripts against the production database at deployment time.

ncarandini commented 5 years ago

I can agree that seeding can be seen as a devops task, but the same can be about migrations, and yet we can manage migrations from the app solution, so we should do the same about seeding. Moreover, we need to seed the db not only to set the intial state after deploying but also to test app functionality setting a known initial state and checking the result state. Last but not least, abouty the concurrency, I can immagine that a multi instance deployment of the ASP.NET Core app (i.e. with Kubernetes) needs to be aware of db creation and db seeding, to avoid that multiple instance of the same app try to build the same db and seed it at the same time. So framework management code should be written to take care of it, in the same way that migrations take care of checking or updating the db schema. I suppose that the framework code should manage it, freeing the app dev to write the code by himself, because it can be tedious, time consuming and error prone. For migrations, a table is added to the db schema, maybe a seeding table can better handle the seeding management and use a table row to flag the ongoing seeding, like a "db_lock" semaphore to avoid race conditions between web app instances. That way we could also mark the db instance with the environment it's ment to be used, like "development" or "production" so the framework code can prevent to seed for a test a "production" db instance, and set the seeding state of a db instance from "just created" to "post-deployed seeding done" when the first web app instance do it (others web app instances won't do it cause of the "db_lock" semaphore I've mentioned earlier). Moreover, these example rules I've written above, could be defined in a "seeding rules" json file. The possibilities are endless, we just need to have a better and richer story about seeding.

ajcvickers commented 5 years ago

From #882

Comments from the feedback control about what's missing:

bchavez commented 5 years ago

If you need fake data for any of your EF documentation examples, take a look at Bogus: https://github.com/bchavez/Bogus

Some other divisions at Microsoft have already taken a dependency on Bogus; so it might be worth looking into. Also, Bogus is already part of Microsoft docs.

I'll be happy to help when and where I can to make Bogus and EF work better together.

Thanks, Brian Chavez

ajcvickers commented 5 years ago

@bchavez Thanks!

AndriySvyryd commented 4 years ago

@ajcvickers Can you make the initial stab at this and assign it back to me for the remaining work?