dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.26k stars 4.73k forks source link

Random Records go missing, using linq, after .toList .toArray .AsEnumerable #26797

Closed wesleyolis closed 4 years ago

wesleyolis commented 6 years ago

Hi,

I never got a chance to report this bug at my previous company for time constants. Just to give you a bit of back ground, we under took the re-write of a entire existing project, onto the early er beta release of dotnet beta using RX technologies. Because we had an exising process with which we could proof and run both system at the same time, to ensure we correctly perfectly re-wrote the system, provide the data was spares enough of course. Over period of year a few months we ran import and exports of data using the new process, fix thing never even been picked up before in the existing system, comparing the results of both system at different stages.

We finally end up with a very few long days of peeling back the layers and investigate were one of our records went missing that we couldn't explain. We eventually pined it down to single line of code, that made it work of fail. Upon knowing were the issues was, we then undertook a more detail investigation into the frameworks and SQL being generate everything to figure out, who was responsible for this.

Here is everything we eliminated.

This lead us to believe it is a bug in the framework upon which we were working, but due to use already having spent enough time on this project and having many projects that also need out attention, we just change the one liner for something else, which cause everything to work and made a very big note.

I am not able to give you the data or anything like that, but what I can do is provide you with the line of code and some context, to what made it work and what didn't make it work. Then the only way I could fix it, if bug still exist is to have a team of experience guys, clearly double check the mechanism around this line of code, checking for some corner case bug, that happens in the most rarest of cerium stances.

As I say the only reason we pick this up, because we had existing process against which I vet everything we were joining about 400 000 records, in which I then discover a record went missing.

My concern is that we may have been the only ones who every picked this corner case up, because of we re-wrote and existing project and had a longer period of time ensure the process was perfect and identical to a reference, to whom we then decided, whether the new system or the old system was correctly and moved forward. Then eventually signed it off after 3/4 no hiccup runs, after I determined their was not enough data that had been through, that was random enough to caused pretty much every things to have been touched, logically wise.

I guess that most projects would have the luxury of being able to have a reference to compare their worth with, their would wouldn't even be looking into weather their systems were working at this level, with this amount of data. Especially test cases.

So therefore I am still reporting this, with the hopes that this bug no longer existing the current releases, somehow got worked out, but their really is only one way to know for certain and that is if you guys can verify this by checking the development history of work in this section of code, then just checking the current code by re-reading it again, checking for any corner cases.

We were using the local SQL DB.

The buggy code at hand.

  /*
        virtual protected IEnumerable<DiffPair<TModel>> InternalItems(IQueryable<TModel> collections)
        {

            int currentImportID = CurrentImportContext.Import.ID;
            int previousImportID = PreviousImportContext.Import.ID;

            // Their is bug, that if this collection is not evaluted before the where clause using .AsEnumerable(), Tolist, ToArray,
            // the radio records randomly go missing.
            // For this statment to evalute up front takes a large over head in resource.
            // Revert to old code, that is faster and simpler in terms of the sql statment generated.
            // Leaving this code here, so no one attempts to use this approach again.
            var items = collections.AsEnumerable().Where(x => previousImportID == x.ImportID || currentImportID == x.ImportID);
                //.ToArray();

            var pairEntries = items
                .Select(x => new { CI = x.CI, LAC = x.LAC })
                .Distinct();

            return pairEntries.GroupJoin(items, x => x, x => new { x.CI, x.LAC }, (_, entries) => new DiffPair<TModel>(
                entries.SingleOrDefault(x => x.ImportID == previousImportID),
                entries.SingleOrDefault(x => x.ImportID == currentImportID)
            ));
        } */

Configuration

{
  "version": "1.0.0-*",
  "description": "IXUS.EMSS.Netcon.DiffPhase.Tests Class Library",
  "authors": [ "woliver" ],

  "packOptions": {
    "tags": [ "" ],
    "projectUrl": "https://git.emss.co.za/emssixus/netcon/tree/master/IXUS.EMSS.Netcon.DiffPhase.Tests",
    "licenseUrl": ""
  },

  "buildOptions": {
    "copyToOutput": [
      "appsettings.json",
      "../IXUS.EMSS.Netcon.Model.Seed/Seeds/*"
    ]
  },

  "frameworks": {
    "net46": {
    }
  },
  "dependencies": {
    "Microsoft.EntityFrameworkCore.InMemory": "1.0.0-rc2-final",
    "IXUS.Core": "1.0.0",
    "IXUS.EMSS.Netcon.DiffPhase": "1.0.0",
    "Microsoft.Extensions.Configuration.Json": "1.0.0-rc2-final",
    "xunit": "2.1.0",
    "dotnet-test-xunit": "1.0.0-rc2-build10025"
  },

  "testRunner": "xunit"
}

An alternative method, which I wrote Initially that works and has tests proving such, that I then reverted to because the speed implications of the fix.

  /*
       protected IEnumerable<DiffPair<TModel>> InternalItemsOrderByCILAC(IQueryable<TModel> collections)
        {
            // RunTime at least 9.5s for 67279 Atoll3GCells and 25810 Atoll2GCells for Large test file.                         
            // We could look at implement and optermised version of this called
            // distinct pairs, If we implemente and interface for DiffPair.

            int currentImportID = CurrentImportContext.Import.ID;
            int previousImportID = PreviousImportContext.Import.ID;

            // Order natural by the existing compound index in the model CI, LAC.
            var records = collections.Where(
                w => (w.ImportID == currentImportID || w.ImportID == previousImportID));

            TModel previous = null;

            var seed = records.FirstOrDefault();

            if (seed != null)
            {
                int previousCI = seed.CI;
                int previousLAC = seed.LAC;

                foreach (var record in records)
                {
                    if (previous != null)
                    {
                        if (previousCI == record.CI && previousLAC == record.LAC)
                        {
                            if (record.ImportID == currentImportID)
                            {
                                yield return new DiffPair<TModel>(previous, record);
                            }
                            else if (record.ImportID == previousImportID)
                            {
                                yield return new DiffPair<TModel>(record, previous);
                            }
                            previous = null;
                        }
                        else
                        {
                            if (previous.ImportID == currentImportID)
                            {
                                yield return new DiffPair<TModel>(null, previous);
                            }
                            else if (previous.ImportID == previousImportID)
                            {
                                yield return new DiffPair<TModel>(previous, null);
                            }

                            previousCI = record.CI;
                            previousLAC = record.LAC;
                            previous = record;
                        }
                    }
                    else
                    {

                        previousCI = record.CI;
                        previousLAC = record.LAC;
                        previous = record;
                    }
                }

                if (previous != null)
                {
                    if (previous.ImportID == currentImportID)
                    {
                        yield return new DiffPair<TModel>(null, previous);
                    }
                    else if (previous.ImportID == previousImportID)
                    {
                        yield return new DiffPair<TModel>(previous, null);
                    }
                }
            }
        }

Test Code for the procedure

   [Fact]
        public void Empty()
        {
            var items = new AtollEntryModel[] { };

            var source = new TestAtollCellEntryBase(items, CurrentImport, PreviousImport);

            Assert.Equal(0, source.Items.Count());
        }

        [Fact]
        public void NewCell()
        {
            var items = new []
            {
                new AtollEntryModel() { CI = 1, LAC = 101, ImportID = CurrentImport.Import.ID },
            };

            var source = new TestAtollCellEntryBase(items, CurrentImport, PreviousImport);

            var resultsItems = source.Items;

            Assert.Equal(1, resultsItems.Count());

            Assert.Equal(null, resultsItems.First().Previous);
            Assert.Equal(items.First(), resultsItems.First().Current);
        }

        [Fact]
        public void MissingCell()
        {
            var items = new []
            {
                new AtollEntryModel() { CI = 1, LAC = 101, ImportID = PreviousImport.Import.ID },
            };

            var source = new TestAtollCellEntryBase(items, CurrentImport, PreviousImport);

            var resultsItems = source.Items;

            Assert.Equal(1, resultsItems.Count());
            Assert.Equal(items.First(), resultsItems.First().Previous);
            Assert.Equal(null, resultsItems.First().Current);
        }

        [Fact]
        public void ExistingCell()
        {
            var items = new []
            {
                new AtollEntryModel() { CI = 1, LAC = 101, ImportID = PreviousImport.Import.ID },
                new AtollEntryModel() { CI = 1, LAC = 101, ImportID = CurrentImport.Import.ID }
            };

            var source = new TestAtollCellEntryBase(items, CurrentImport, PreviousImport);

            var resultsItems = source.Items;

            Assert.Equal(1, source.Items.Count());

            Assert.Equal(items[0], resultsItems.First().Previous);
            Assert.Equal(items[1], resultsItems.First().Current);

        }

        [Fact]
        public void ExistingCellReversed()
        {
            var items = new []
            {
                new AtollEntryModel() { CI = 1, LAC = 101, ImportID = CurrentImport.Import.ID },
                new AtollEntryModel() { CI = 1, LAC = 101, ImportID = PreviousImport.Import.ID }
            };

            var source = new TestAtollCellEntryBase(items, CurrentImport, PreviousImport);

            var resultsItems = source.Items;

            Assert.Equal(1, source.Items.Count());

            Assert.Equal(items[1], resultsItems.First().Previous);
            Assert.Equal(items[0], resultsItems.First().Current);
        }

        [Fact]
        public void Combination1()
        {
            var items = new []
            {
                new AtollEntryModel() { CI = 1, LAC = 101, ImportID = PreviousImport.Import.ID },
                new AtollEntryModel() { CI = 2, LAC = 101, ImportID = CurrentImport.Import.ID },
                new AtollEntryModel() { CI = 2, LAC = 101, ImportID =  PreviousImport.Import.ID}
            };

            var source = new TestAtollCellEntryBase(items, CurrentImport, PreviousImport);

            var resultsItems = source.Items;

            Assert.Equal(2, source.Items.Count());

            Assert.Equal(null, resultsItems.First().Current);
            Assert.Equal(items[0], resultsItems.First().Previous);

            Assert.Equal(items[1], resultsItems.Skip(1).First().Current);
            Assert.Equal(items[2], resultsItems.Skip(1).First().Previous);
        }

        [Fact]
        public void Combination2()
        {
            var items = new []
            {
                new AtollEntryModel() { CI = 1, LAC = 101, ImportID = CurrentImport.Import.ID },
                new AtollEntryModel() { CI = 2, LAC = 101, ImportID = CurrentImport.Import.ID },
                new AtollEntryModel() { CI = 2, LAC = 101, ImportID =  PreviousImport.Import.ID}
            };

            var source = new TestAtollCellEntryBase(items, CurrentImport, PreviousImport);

            var resultsItems = source.Items;

            Assert.Equal(2, source.Items.Count());

            Assert.Equal(items[0], resultsItems.First().Current);
            Assert.Equal(null, resultsItems.First().Previous);

            Assert.Equal(items[1], resultsItems.Skip(1).First().Current);
            Assert.Equal(items[2], resultsItems.Skip(1).First().Previous);
        }

        [Fact]
        public void Combination3()
        {
            var items = new []
            {
                new AtollEntryModel() { CI = 2, LAC = 101, ImportID = CurrentImport.Import.ID },
                new AtollEntryModel() { CI = 2, LAC = 101, ImportID =  PreviousImport.Import.ID},
                new AtollEntryModel() { CI = 3, LAC = 101, ImportID = CurrentImport.Import.ID }
            };

            var source = new TestAtollCellEntryBase(items, CurrentImport, PreviousImport);

            var resultsItems = source.Items;

            Assert.Equal(2, source.Items.Count());

            Assert.Equal(items[0], resultsItems.First().Current);
            Assert.Equal(items[1], resultsItems.First().Previous);

            Assert.Equal(items[2], resultsItems.Skip(1).First().Current);
            Assert.Equal(null, resultsItems.Skip(1).First().Previous);
        }

        [Fact]
        public void Combination4()
        {
            var items = new []
            {
                new AtollEntryModel() { CI = 2, LAC = 101, ImportID = CurrentImport.Import.ID },
                new AtollEntryModel() { CI = 2, LAC = 101, ImportID =  PreviousImport.Import.ID},
                new AtollEntryModel() { CI = 3, LAC = 101, ImportID = PreviousImport.Import.ID }
            };

            var source = new TestAtollCellEntryBase(items, CurrentImport, PreviousImport);

            var resultsItems = source.Items;

            Assert.Equal(2, source.Items.Count());

            Assert.Equal(items[0], resultsItems.First().Current);
            Assert.Equal(items[1], resultsItems.First().Previous);

            Assert.Equal(null, resultsItems.Skip(1).First().Current);
            Assert.Equal(items[2], resultsItems.Skip(1).First().Previous);
        }

        [Fact]
        public void Combination5()
        {
            var items = new []
            {
                new AtollEntryModel() { CI = 2, LAC = 101, ImportID = CurrentImport.Import.ID },
                new AtollEntryModel() { CI = 2, LAC = 101, ImportID =  PreviousImport.Import.ID},
                new AtollEntryModel() { CI = 3, LAC = 101, ImportID = PreviousImport.Import.ID },
                new AtollEntryModel() { CI = 4, LAC = 101, ImportID = PreviousImport.Import.ID },
                new AtollEntryModel() { CI = 4, LAC = 101, ImportID = CurrentImport.Import.ID},
            };

            var source = new TestAtollCellEntryBase(items, CurrentImport, PreviousImport);

            var resultsItems = source.Items;

            Assert.Equal(3, source.Items.Count());

            Assert.Equal(items[0], resultsItems.First().Current);
            Assert.Equal(items[1], resultsItems.First().Previous);

            Assert.Equal(null, resultsItems.Skip(1).First().Current);
            Assert.Equal(items[2], resultsItems.Skip(1).First().Previous);

            Assert.Equal(items[3], resultsItems.Skip(2).First().Previous);
            Assert.Equal(items[4], resultsItems.Skip(2).First().Current);

        }
    }
karelz commented 6 years ago

It is not clear to me where you think is a bug. As next best step, I would suggest to create a repro - i.e. simulate the data that caused it, stress out the system, minimize the repro. Reviewing code when it is not clear where exactly is the problem is rarely successful way to root cause issues.

wesleyolis commented 6 years ago

Referring to the comment here, with regards to the statement below. Unfortunately I was not allowed to dive into the source due to time. This is now as you can see a good period and while ago. I have moved on. Well the thing is it seems to stress out different machines.. However, the bug may havn't migrated itself. Basically in biggers system if I typically go read existing peoples code, I can easily spot what is wrong, because I know the technical details of things I am looking for that can go wrong. But then I have work on projects like albion GIS(Otuka) framework, that wrap existing database with in memory virtualization veneers, were wrote everything ourselfs, that typically how I go about solving these big data issues, when I can only pin it down this far, usually quite obvious at times. I am afraid to say the only help I can give you guys is these lines of code, unless you ask for something more specific versions and things like that.

Really straight forward, if we don't add .AsEnumerable(), Tolist, ToArray to the collections statment below, before the .Where clause, the bug is present, by adding one of the functions mentioned above it disappeared consistently every time on every machine every run.

collections.**AsEnumerable()**.Where(x => previousImportID == x.ImportID || currentImportID == x.ImportID);

// Their is bug, that if this collection is not evaluted before the where clause using .AsEnumerable(), Tolist, ToArray, // the radio records randomly go missing. // For this statment to evalute up front takes a large over head in resource. // Revert to old code, that is faster and simpler in terms of the sql statment generated. // Leaving this code here, so no one attempts to use this approach again. var items = collections.AsEnumerable().Where(x => previousImportID == x.ImportID || currentImportID == x.ImportID); //.ToArray();

wesleyolis commented 6 years ago

This should help you make the connection between the base class and parent class below.

virtual protected IEnumerable<DiffPair<TModel>> InternalItems(IQueryable<TModel> collections)
        {
            return InternalItemsOrderByCILAC(collections.OrderBy(o => o.CI).ThenBy(o => o.LAC));
        }
 public class Atoll3GCellEntry : AtollCellEntryBase<Model.Models.Atoll3GCell>
    {
        public Atoll3GCellEntry(Model.Database database, IImportContextCurrentCurrent currentImportContext,
            IImportContextCurrentPrevious previousImportContext)
            : base(currentImportContext, previousImportContext)
        {
            Database = database;
        }

        protected Model.Database Database { get; }

        public override IEnumerable<DiffPair<Atoll3GCell>> Items
        {
            get
            {
                return InternalItems(Database.Atoll3GCells
                    .Include(x => x.Antenna).ThenInclude(x => x.Manufacturer)
                    .Include(x => x.County).ThenInclude(x => x.Region).ThenInclude(x => x.Country));
            }
        }
    }
modelBuilder.Entity<Atoll3GCell>(b =>
            {
                b.HasKey(c => c.ID);
                b.HasIndex(c => new { c.ImportID, c.CI, c.LAC }).IsUnique();
                b.HasOne(c => c.Import).WithMany().HasForeignKey(c => c.ImportID).OnDelete(DeleteBehavior.Cascade);
                b.HasOne(c => c.County).WithMany().HasForeignKey(c => c.CountyID);
                b.HasOne(c => c.Antenna).WithMany().HasForeignKey(c => c.AntennaID);
                b.Property(c => c.Power).IsRequired(false);
            });
karelz commented 6 years ago

@OmarTawfik do you the information above is actionable? If not, let's close it as such (not enough information).

wesleyolis commented 6 years ago

Added some more context for the table with regards to the relationships.

Mabye this helps, some commit history: Which basically show the move to the linq and then away but to the original approach. During which we ran many imports say at least 10 production runs, with out pre-runs and other tests. which probably put in the order or like 50 runs.

It is the only code modifications during this period.

-- Author: Wesley Oliver wesley.olis@gmail.com 2017-01-30 13:39:10 Committer: Wesley Oliver wesley.olis@gmail.com 2017-01-30 13:39:10 Parent: 50760e2ace854b1203745e9e3218ce997ccb9625 (refactor: Switch to simpler algorithms for performing diff source data retrieval) Branches: fix/antenna-manufacture-and-model-rewritten-in-place-for-case-correction, fix/antenna-manufacture-models-match-now-case-sensitive, fix/diffPhase-source-pair-generator-linq-sql-has-internal-bug, fix/export-name-field-to-ixus-result-phase3-table-scan, remotes/origin/fix/antenna-manufacture-and-model-rewritten-in-place-for-case-correction, remotes/origin/fix/antenna-manufacture-models-match-now-case-sensitive, remotes/origin/fix/diffPhase-source-pair-generator-linq-sql-has-internal-bug, remotes/origin/fix/export-name-field-to-ixus-result-phase3-table-scan Follows: v2.2.0 Precedes:

bugfix: DiffPhase CI and LAC pair generator, SQL LINQ bug, fix required upfront evaluation,
causing massive amounts of resouces.
Reverted to old pairing algorithm. does simple pull and order by CI,LAC
and then pairs the records.

-- Author: Benjamin Pannell admin@sierrasoftworks.com 2016-06-09 15:09:07 Committer: Benjamin Pannell admin@sierrasoftworks.com 2016-06-09 16:28:18 Parent: 22427eaa138e58e46ba7ab225afd9ae397b57ab2 (refactor: Switch to .NET Core RC2) Child: e4d72b1302ce3b14c00e2bb0da740bdd7ce89c80 (bugfix: DiffPhase CI and LAC pair generator, SQL LINQ bug, fix required upfront evaluation,) Branches: fix/antenna-manufacture-and-model-rewritten-in-place-for-case-correction, fix/antenna-manufacture-models-match-now-case-sensitive, fix/diffPhase-source-pair-generator-linq-sql-has-internal-bug, fix/export-name-field-to-ixus-result-phase3-table-scan, master, remotes/origin/fix/antenna-manufacture-and-model-rewritten-in-place-for-case-correction, remotes/origin/fix/antenna-manufacture-models-match-now-case-sensitive, remotes/origin/fix/diffPhase-source-pair-generator-linq-sql-has-internal-bug, remotes/origin/fix/export-name-field-to-ixus-result-phase3-table-scan, remotes/origin/master Follows: v2.2.0 Precedes:

refactor: Switch to simpler algorithms for performing diff source data retrieval
wesleyolis commented 6 years ago

Hi,

I was think about to does one vet system, were you don't have existing system to compare re-written process with, for others in future writing systems, of say for instance experimental, proprietary frameworks, those have alot of innovation and churn.

This is what I have come up with, maybe you guys can look into what I came up, to put the industry in a better footing for the longer term.

--

https://docs.google.com/document/d/1LYPiV3ryvFR1lTQXHprX_5sUDfpI4YUdSjUWmV-gV0A/edit?usp=sharing

How to vet and validate dependent parts of complex systems, were to much data for corner cases.

When you building a system onto of alot of abstraction, that has newly been build or re-written recently from the ground up like link how should one go about ensuring there are no bugs in linq as and example, were you don’t have a existing system to vet against. I can think of two strategies, both of which first all on the provider of the framework. One of the methods, allows the provider to crowd source validation of its systems, by companies implementing a new process for production runs, that vet the system every 2 months or for a good period after inception.

How did this come to light, we were re-writing and existing system onto a new framework, so we have a system which we could do many runs and compare the results, and interrogate the process at many different stage at the detailed level I choose to Although the process was really slow, especially old runs, took hours/days, before results, one the reason for the re-write in the first place. We discovered that now thinking of it, in like run the 50 run, a year later after nearly 2 no-issues runs, just about to approve the system and sign it off a record just went missing, which came down to a single statement in LINQ, whether present or not. .ToArray(), .ToList() fixed it. When you have a process that is running though few thousand or millions of records, how does one go about validating a system like that unless you are a robot or some super AI, that give you answer in seconds..

The two approaches:

Establish 3rd Party interface abstract pattern validate, to which we validate our abstractions.

Have a single body established by the computer science body, that providers validation for abstracte patterns for all corner cases, with which companies can submit their work, to all know pattern constraints they trying to meet. This company, would then provide everyone else with a way to test their code against massive world wide consensus of knowledge. Right now our test are as good as the person who wrote them, who could or couldn’t have seen the croner cases. We need to establish a body, that can provide test and pattern validation as a 3rd party service, they would be pulling on a wealth of years of the worlds best contributing to this effort, also allow them to learn, were they may have made mistakes in big company systems before they happen.

Provide a production mode, that end procedures, functions, that additional checks to be run using external tools, that can validate things like LINQ.

The framework must implement a few options that are esposed, that would turn on some validation of the code during production runs, this be especially usefully in experimental, trail, propritarty code. It would slow down the code for these, runs, but provide us with additional assurance that their is nothing wrong.

I would envisage the following modes, but probably be a whole lot more start ones in future.

Requirements from existing frameworks, to be able to validate things in this way, Would be a new SQL mode call VALIDATE, which like UPDATE, but, would just check Attributes match, that have been sent over the wire. One would still have to decided to pull all the data if one want to do this on the client side of things, so better to just push the data then, unless you have faster download speed, then better to have a version of this tool, capable of running on client side. I feel that both simple modes should be supported.

If you have any other ideas please feel free to share and comment.

The other thing is production system, will now lend themselves better to Cloud computing, because they now have to have runs, ensure the system is correct, which is going to require more resource than they are willing to over provide. Typically the could be sharing those resource with other companies, which could or their runs out of sync/time with each other, reducing the infrastructure over heads.

danmoseley commented 6 years ago

@wesleyolis unfortunately this is not actionable for us. We would need code that repros the problem, , or at least a problem that is much more localized and explicitly described. Please reactivate if you get to that point.