Alois-xx / SerializerTests

.NET Serializer testing framework
76 stars 22 forks source link

A few flaws #6

Closed aienabled closed 6 years ago

aienabled commented 6 years ago
  1. TestData is generated in time which is counted by stopwatch. I'm not sure, but it seems it might favor the latter serializers in the testing order. We want to compare the serializers performance and counting generation of the test data (which is very slow and heavy due to strings generation,) towards the serialization duration is a serious flaw. Better if the test data will be pre-generated and simply passed in a delegate. For example

    var testData = CreateTestData(); // no TestData property!
    var times = Test(nTimes, () =>
    {
    var dataStream = GetMemoryStream();
    TestSerializeOnly(dataStream, testData);
    });
  2. MemoryStream should be created with preallocating reasonable amount of memory for all the cases (so there will be no memory reallocation every time the memory stream is exceeding capacity, otherwise latter serializer runs will produce favored results).

  3. No GC performed before each test run, but a lot of the previous serializer test run artifacts are still allocated in GC0/GC1 and even GC2 heaps (generated IL code). In some cases it might lead to the sudden GC collection and dramatically affect the results (in my case up to 30% serialization/deserialization duration increase, making comparison of serializers totally invalid). Usage of BenchmarkDotNet to perform the test runs can handle this issue perfectly https://github.com/Alois-xx/SerializerTests/issues/5 For now I would suggest adding before each stopwatch starts call of this method:

    private void CollectGC()
    {
    GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
    
    // just to ensure complete GC
    for (var i = 0; i < 3; i++)
    {
        GC.Collect();
        GC.WaitForPendingFinalizers();
    }
    }

I've done these changes locally (except #2) and the results are much smaller total time and much better consistency between the test suite launches.

Regards!

Alois-xx commented 6 years ago

@aienabled:

Regarding 3: Have you looked at the latest version?

       List<double> Test(int n, Action acc)
        {
            List<double> times = new List<double>();
            Stopwatch sw = new Stopwatch();

            for (int i = 0; i < n; i++)
            {
                GC.Collect();
                GC.WaitForPendingFinalizers();
                ManualResetEvent ev = GetEvent(!acc.Method.Name.Contains("Deserialize"));
                sw.Restart();
                acc();
                sw.Stop();
                ev.Set(); // end waiting to get a nice context switch how long a test did really take in elapsed time
                times.Add(sw.Elapsed.TotalSeconds);
            }

            return times;
        }

For the current update the numbers were taken with GC.Collect each time. The actual objects are created during serialization. When you allocate larger objects the OS is most likely to blame with the soft fault page fault performance. Then many things like other processes running on different cores, dynamic frequency adjustments (e.g. my 3,5 GHz Core I7 is running at 3,8 GHz if nothing else runs because the TDP allows it). How big are your deviations between different runs before and after your changes? Is the GC fix which is in the latest baseline already sufficient?

Regarding 2: During Deserialization the memory stream is initialized with the data before it is read

 public (double firstS, double averageS, long dataSize) TestDeserialize(int nTimes, int nObjectsToCreate)
        {
            myStream = ReadMemoryStreamFromDisk(nObjectsToCreate);  **// The actual reading happens before timing starts**

            long size = myStream.Length;
            var times = Test(nTimes, () =>
            {
                var dataStream = GetMemoryStream(); **// Get memory stream at position 0 nothing else happens there**
                TestDeserializeOnlyAndTouch(dataStream, out T deserialized);
            });

            return CalcTime(times, size);
        }

Regarding 1: Since the amount of work performed to allocate the objects is the same the overhead is nearly constant since each run is performed 5 times where the median value is taken. If the first run is slower due to GC growing its segments the median will factor that out.

aienabled commented 6 years ago

@Alois-xx,

Regarding 3. I didn't see the latest commit as I've forked your test suite right before it to integrate my serializer. Good to know you've already addressed GC problem. Previously there were absolutely no GC.Collect() calls and it definitely skewed the results. However, please pay attention to my suggested code for GC collection. Especially GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce; as otherwise your memory stream instances (created for TestSerialize) from previous serializers testing are most likely (I didn't verified) kept in LOH forever. Thought it's not so relevant if you will apply my suggestion number 2.

Regarding 1. I agree that this a constant time, but you want to measure a serializer performance and for the test purity there should be no any extra work (and TestData generation is super-heavy/memory-intensive). Also, it creates excessive pressure on GC as you're creating data right in the test case and GC is most likely to trigger too early during serializer run (as GC0 heap size is exceeded) affecting the first run. It will also affect the memory allocations measurements (if you even will implement this). My point is - if you want to perform accurate first-run measurements, the test data need to be allocated before the test run. Then GC.Collect(). Then the test run. The fix is so simple that I see no point of keeping it as is.

Regarding 2. Yes, but I meant the TestSerialize. You're allocating a new stream for each serializer (with initial 512 bytes capacity), then its capacity is growing (causing expensive buffer recreation and copy). This is not what you want include into measurements as this is not related to an actual serializer performance, and it also favors serializers producing a smaller output (as fewer times buffer recreation will happen). So my suggestion is to allocate a memory stream with a large enough fixed size buffer and use it everywhere (make it a singleton).

aienabled commented 6 years ago

How big are your deviations between different runs before and after your changes? Is the GC fix which is in the latest baseline already sufficient?

I didn't actually measured the difference - I've run multiple tests and the difference between my serializer and Wire (commented everything else except it and SlimSerializer) was about ±15% randomly. So I wrote in my first message above that it's about 30% difference. I've applied my suggestions number 1 (GC.Collect) and 3 (test data allocation) and the changes reduced the difference and made results much more consistent - so I was able to apply various optimizations to my serializer and actually measure their effect. However, even with "High performance" power mode (fixed CPU clock speed) and no background processes the results were not perfectly consistent between runs. I've already suggested using BenchmarkDotNet as it can easily address this issue. Meanwhile, applying this:

var process = Process.GetCurrentProcess();
process.PriorityClass = ProcessPriorityClass.High;
process.PriorityBoostEnabled = true;

improved the consistency. Also, using more test runs (at least 10) seems to help.

Alois-xx commented 6 years ago

@aienabled: I have added the pre allocation of objects and memory stream resize for serialize test. The numbers for the fastest serializers did not change much. The biggest issue was certainly the GC state which could leak to other serializer tests. But not in the sense you would normally expect. Profiling showed that at some point the allocation rate is throttled with Sleep(50ms) by the GC at random times when it thinks the allocation rate it too high.

Large Object Heap compactation is not an issue because we have on the LOH always only one MemoryStream. LOH Fragmentation is therefore not an issue. This would clutter the code and reader of the code might wonder what miracles an LOH compactation might do. You can check the GC state PerfView by taking a memory dump if you are curious. Objects on the LOH are extra flagged by it.