dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.75k stars 3.18k forks source link

Slow data load with complex key, one of which is byte[] #14202

Open NaNgets opened 5 years ago

NaNgets commented 5 years ago

When having an entity which has a complex key, with one of the key's type as byte[], and at least another field which is not in the key - data load takes a long time and loads the DB.

The affected load is a wait called "wait/io/socket/sql/client_connection". I see it specifically spike when I run the code on the MySQL server performance schema. This happens on two DIFFERENT entityframework packages, both the MySQL's official one and Pomelo's unofficial, so this leads me to believe the problem lies in EFCore. (Can someone try\confirm it on other DBs?)

This does NOT happen when I use .AsNoTracking() on the code.

Example code is given below. The threads are just to load the system, in a normal scenario they shouldn't have a major impact, but in this case there is.

If, for example, I change the .ToArray() to AsNoTracking().ToArray() - it's OK. If I change the .ToArray() to .Select(tt => new { tt.NumberField, tt.BytesField, tt.AnotherField }).ToArray() - it's OK. If I change the entity.HasKey(e => new { e.NumberField, e.BytesField } ); to entity.HasKey(e => e.BytesField); or entity.HasKey(e => e.NumberField ); - it's OK. If I remove the AnotherField and only leave the key fields - it's OK.

I don't know why this is, but I'd really like to fix it. If you can point me to the problematic section, I'd be happy to create a PR.

Steps to reproduce

    public class MyDbContext : DbContext
    {
        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        {
            if (!optionsBuilder.IsConfigured)
            {
                var connString = "server=localhost;user id=user;password=password;persistsecurityinfo=True;Charset=utf8mb4;
                optionsBuilder.UseMySQL(connString);
            }
        }

        protected override void OnModelCreating(ModelBuilder modelBuilder)
        {
            modelBuilder.Entity<TestTable>(entity =>
            {
                entity.HasKey(e => new { e.NumberField, e.BytesField } );

                entity.ToTable("testTable");

                entity.Property(e => e.NumberField)
                    .HasColumnName("numberField")
                    .HasColumnType("int(11)");

                entity.Property(e => e.BytesField)
                    .HasColumnName("bytesField")
                    .HasColumnType("binary(16)");

                entity.Property(e => e.AnotherField)
                    .HasColumnName("anotherField")
                    .HasColumnType("int(11)");
            });
        }
    }

    public class TestTable
    {
        public int NumberField { get; set; }
        public byte[] BytesField { get; set; }
        public int AnotherField { get; set; }
    }

    class Program
    {
        static void Main(string[] args)
        {
            for (var i = 0; i < 75; i++)
            {
                new Thread(new ThreadStart(DoSomething2)).Start();
                Thread.Sleep(100);
            }

            Thread.Sleep(100000);

            Console.Read();
        }

        static void DoSomething2()
        {
            using (var db = new MyDbContext())
            {
                Console.WriteLine("Created");

                var tt = db.Set<TestTable>().ToArray();

                Console.WriteLine("Got {0}", tt.Length);
            }
        }
    }

Further technical details

EF Core version: 2.1.4 Provider: Both MySql.Data.EntityFrameworkCore 8.0.13 and Pomelo.EntityFrameworkCore.MySql 2.1.4 Database: MySQL 5.7 Operating system: Linux IDE: Visual Studio 2017 15.9

ajcvickers commented 5 years ago

Note for triage. Composite keys are known to be slower. I am able to reproduce this with a warm query against SQL Server on my machine. I get 5.7 seconds for loading a million million entities with a simple key, and 9.7 seconds with a composite key. Test code below.

public class TestTable
{
    public int NumberField { get; set; }
    public byte[] BytesField { get; set; }
    public int AnotherField { get; set; }
}

public class BloggingContext : DbContext
{
    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        optionsBuilder
            .UseSqlServer(@"Server=(localdb)\mssqllocaldb;Database=Test;ConnectRetryCount=0");
    }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<TestTable>(entity =>
        {
            entity.HasKey(e => new { e.NumberField, e.BytesField });
            //entity.HasKey(e => e.NumberField);

            entity.ToTable("testTable");

            entity.Property(e => e.NumberField)
                .ValueGeneratedNever()
                .HasColumnName("numberField");

            entity.Property(e => e.BytesField)
                .HasColumnName("bytesField");

            entity.Property(e => e.AnotherField)
                .HasColumnName("anotherField");
        });
    }
}

public class Program
{
    public static void Main()
    {
        using (var context = new BloggingContext())
        {
            context.Database.EnsureDeleted();
            context.Database.EnsureCreated();

            var entities = new List<TestTable>();

            for (var i = 1; i <= 1000000; i++)
            {
                entities.Add(new TestTable {NumberField = i, BytesField = new byte[] {0, 11, (byte) (i % 256), 77}});
            }

            Console.WriteLine("Seeding...");

            context.AddRange(entities);

            context.SaveChanges();

            Console.WriteLine("Priming query cache...");

            var results = context.Set<TestTable>().ToArray();
        }

        using (var context = new BloggingContext())
        {
            var _ = context.Model;

            Console.WriteLine("Loading...");

            var timer = Stopwatch.StartNew();

            var results = context.Set<TestTable>().ToArray();

            timer.Stop();

            Console.WriteLine($"Done in {timer.ElapsedMilliseconds}ms");
        }
    }
}
NaNgets commented 5 years ago

@ajcvickers thank you very much.

Is there any recommended way to use a table if I have multiple primary keys?

ajcvickers commented 5 years ago

@NaNgets You mean to make it faster? If so, no not really. Composite keys are just slower at the moment.

NaNgets commented 5 years ago

Ok, however, it's not just slower. In that specific case - it creates a massive wait (load) on the server. On the EntityFramework 6 it works very well.

Is there any way to not create that wait? Either by rewriting a specific section (I'd be happy to submit a PR if it's something I can understand and do) or any recommendation to use it otherwise, e.g. with indexes instead of keys or any workaround? I.E If I have multiple PKs in my table - the recommended pattern is to use a composite key, or should I something else e.g. a single key with indexes\unique on the composite key?

Thanks.

ajcvickers commented 5 years ago

What numbers are you seeing? Are they significantly different from the numbers I posted in terms of relative performance?

NaNgets commented 5 years ago

As I said, I'm talking about a load in the MySQL called "wait/io/socket/sql/client_connection". I monitor my DB instance and on a normal day (when I use the EF6) that wait is never more than 0.1, but if I use the EFCore that wait can get up to 200. On a linux server which has 8 CPUs, the waits\loads in total should be preferred to be less than the number of CPUs, so a single wait with a value of 200 is too much...

As said, when I use AsNoTracking or select the properties (instead of selecting the entities) - the wait isn't spiked.

(For your question - yes, it does take more time, but my MAIN problem is the waits that it causes to the server regarding connections).

ajcvickers commented 5 years ago

@NaNgets Thanks for the info. We'll discuss, but it's unlikely anything can be done here in the short term.

With regard to using composite keys, you certainly shouldn't use them unless they are needed. They are typically only needed when the value from a single column does not uniquely identify an entity instance.

NaNgets commented 5 years ago

Thank you very much for your time :)

ajcvickers commented 5 years ago

We discussed this and we don't have the expertise with MySQL to be able to determine whether there is something specific that is causing the waits that you are describing. I suspect this will need debugging by someone who is an expert in MySQL. However, putting this on the backlog to track the performance difference between simple and composite keys that the repo code demonstrates.