ZiggyCreatures / FusionCache

FusionCache is an easy to use, fast and robust hybrid cache with advanced resiliency features.
MIT License
1.56k stars 84 forks source link

[FEATURE] Utilize Microsoft.IO.RecyclableMemoryStream in flavor of creating new MemoryStream instances when serializing #241

Open viniciusvarzea opened 2 months ago

viniciusvarzea commented 2 months ago

Problem

The serializers implemented are using a new MemoryStream instance every time it have to serialize or deserialize the cache entries (in async overloads), it can leave to high allocation rates (and time spent in GC).

Solution

How about implement in serializers the features from Microsoft.IO.RecyclableMemoryStream library?

viniciusvarzea commented 2 months ago

If you have any free time, here is a link explaining about the allocation improvements https://medium.com/@matias.paulo84/recyclablememorystream-vs-memorystream-c4aa341324a9

jodydonetti commented 2 months ago

Hi @viniciusvarzea , thanks for the suggestion!

I'm looking into it and doing some benchmarking and it seems in fact quite good.

Will update soon.

viniciusvarzea commented 2 months ago

Hello @jodydonetti, thank you for considering it. I have a large experience using this library, if you need some help, please count on me.

jodydonetti commented 2 months ago

Hi @viniciusvarzea awesome, thanks!

jodydonetti commented 2 months ago

Hi @viniciusvarzea so the implementation has been quite easy to make. This is the System.Text.Json one, with comments removed for brevity:

public class FusionCacheSystemTextJsonSerializer
    : IFusionCacheSerializer
{
    private static readonly RecyclableMemoryStreamManager _manager = new RecyclableMemoryStreamManager();

    public FusionCacheSystemTextJsonSerializer(JsonSerializerOptions? options = null)
    {
        _options = options;
    }

    private readonly JsonSerializerOptions? _options;

    public byte[] Serialize<T>(T? obj)
    {
        return JsonSerializer.SerializeToUtf8Bytes<T?>(obj, _options);
    }

    public T? Deserialize<T>(byte[] data)
    {
        return JsonSerializer.Deserialize<T>(data, _options);
    }

    public async ValueTask<byte[]> SerializeAsync<T>(T? obj)
    {
        using var stream = _manager.GetStream();
        await JsonSerializer.SerializeAsync<T?>(stream, obj, _options);
        return stream.ToArray();
    }

    public async ValueTask<T?> DeserializeAsync<T>(byte[] data)
    {
        using var stream = _manager.GetStream(data);
        return await JsonSerializer.DeserializeAsync<T>(stream, _options);
    }
}

For the RecyclableMemoryStreamManager I've used the default options, but if you have some suggestions I'd be happy to know.

I've create a benchmark, like this:

[MemoryDiagnoser]
[Config(typeof(Config))]
public class SerializersBenchmark
{
  private static Random MyRandom = new Random(2110);

  private class SampleModel
  {
      public string Name { get; set; }
      public int Age { get; set; }
      public DateTime Date { get; set; }
      public List<int> FavoriteNumbers { get; set; } = [];

      public static SampleModel GenerateRandom()
      {
          var model = new SampleModel
          {
              Name = Guid.NewGuid().ToString("N"),
              Age = MyRandom.Next(1, 100),
              Date = DateTime.UtcNow,
          };
          for (int i = 0; i < 10; i++)
          {
              model.FavoriteNumbers.Add(MyRandom.Next(1, 1000));
          }
          return model;
      }
  }

  private class Config : ManualConfig
  {
      public Config()
      {
          AddColumn(StatisticColumn.P95);
      }
  }

  private FusionCacheSystemTextJsonSerializerOld _Old = new FusionCacheSystemTextJsonSerializerOld();
  private FusionCacheSystemTextJsonSerializer _New = new FusionCacheSystemTextJsonSerializer();
  private List<SampleModel> _Models = new List<SampleModel>();
  private byte[] _Blob;

  [Params(1, 100, 10_000)]
  public int Size;

  [GlobalSetup]
  public void Setup()
  {
      for (int i = 0; i < Size; i++)
      {
          _Models.Add(SampleModel.GenerateRandom());
      }
      _Blob = _New.Serialize(_Models);
  }

  [Benchmark(Baseline = true)]
  public async Task SerializeAsync_OLD()
  {
      await _Old.SerializeAsync(_Models).ConfigureAwait(false);
  }

  [Benchmark]
  public async Task SerializeAsync_NEW()
  {
      await _New.SerializeAsync(_Models).ConfigureAwait(false);
  }

  [Benchmark]
  public async Task DeserializeAsync_OLD()
  {
      await _Old.DeserializeAsync<List<SampleModel>>(_Blob).ConfigureAwait(false);
  }

  [Benchmark]
  public async Task DeserializeAsync_NEW()
  {
      await _New.DeserializeAsync<List<SampleModel>>(_Blob).ConfigureAwait(false);
  }
}

and it shows some pretty good old vs new numbers:

Method Size Mean Error StdDev P95 Ratio RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
SerializeAsync_OLD 1 492.6 ns 6.93 ns 6.14 ns 501.4 ns 1.00 0.00 0.0782 - - 984 B 1.00
SerializeAsync_NEW 1 642.1 ns 6.34 ns 5.93 ns 648.9 ns 1.30 0.02 0.0725 - - 920 B 0.93
DeserializeAsync_OLD 1 769.3 ns 11.45 ns 10.71 ns 784.2 ns 1.56 0.03 0.0772 - - 976 B 0.99
DeserializeAsync_NEW 1 999.7 ns 13.76 ns 12.87 ns 1,017.7 ns 2.03 0.04 0.0935 - - 1192 B 1.21
SerializeAsync_OLD 100 23,484.7 ns 102.12 ns 95.52 ns 23,623.2 ns 1.00 0.00 4.7607 0.3967 - 59856 B 1.00
SerializeAsync_NEW 100 22,606.4 ns 142.26 ns 133.07 ns 22,833.3 ns 0.96 0.00 1.2512 - - 15744 B 0.26
DeserializeAsync_OLD 100 51,011.0 ns 500.74 ns 418.14 ns 51,606.7 ns 2.17 0.02 3.2349 0.3052 - 41096 B 0.69
DeserializeAsync_NEW 100 51,964.5 ns 470.68 ns 440.27 ns 52,594.8 ns 2.21 0.02 3.2349 0.3052 - 41312 B 0.69
SerializeAsync_OLD 10000 2,811,384.3 ns 48,038.91 ns 44,935.62 ns 2,889,311.9 ns 1.00 0.00 671.8750 664.0625 664.0625 5264201 B 1.00
SerializeAsync_NEW 10000 2,510,884.4 ns 15,943.09 ns 14,913.18 ns 2,525,672.7 ns 0.89 0.02 160.1563 160.1563 160.1563 1499627 B 0.28
DeserializeAsync_OLD 10000 9,559,862.6 ns 183,341.18 ns 196,173.01 ns 9,831,249.5 ns 3.39 0.09 437.5000 421.8750 125.0000 4103255 B 0.78
DeserializeAsync_NEW 10000 9,666,130.0 ns 152,706.18 ns 142,841.45 ns 9,810,493.9 ns 3.44 0.08 437.5000 421.8750 125.0000 4104012 B 0.78

Overall it looks good, and I'm inclined to use it for the others too and merge it for the new version.

One question: are you aware of any edge case or strange scenario that I should know of, where it may have issues?

Thanks!

jodydonetti commented 2 months ago

Update: applied the sme change to the Protobuf version and:

Method Size Mean Error StdDev P95 Ratio RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
SerializeAsync_OLD 1 318.9 ns 4.58 ns 4.06 ns 325.3 ns 1.00 0.00 0.0353 - - 448 B 1.00
SerializeAsync_NEW 1 743.4 ns 9.77 ns 9.14 ns 756.4 ns 2.33 0.04 0.0305 - - 384 B 0.86
DeserializeAsync_OLD 1 545.6 ns 10.69 ns 13.13 ns 561.3 ns 1.70 0.05 0.0372 - - 472 B 1.05
DeserializeAsync_NEW 1 562.2 ns 8.40 ns 7.85 ns 571.4 ns 1.76 0.03 0.0305 - - 384 B 0.86
SerializeAsync_OLD 100 17,767.3 ns 341.49 ns 393.27 ns 18,325.4 ns 1.00 0.00 1.8921 0.0305 - 23760 B 1.00
SerializeAsync_NEW 100 31,157.1 ns 421.46 ns 394.24 ns 31,735.0 ns 1.75 0.05 0.6104 - - 8280 B 0.35
DeserializeAsync_OLD 100 25,068.9 ns 499.02 ns 554.66 ns 26,041.2 ns 1.41 0.04 2.1667 0.1526 - 27376 B 1.15
DeserializeAsync_NEW 100 26,286.0 ns 180.71 ns 169.03 ns 26,570.1 ns 1.47 0.04 2.1667 0.1526 - 27288 B 1.15
SerializeAsync_OLD 10000 2,035,799.9 ns 20,109.70 ns 16,792.51 ns 2,053,788.3 ns 1.00 0.00 574.2188 566.4063 566.4063 2927083 B 1.00
SerializeAsync_NEW 10000 3,407,010.4 ns 24,684.91 ns 23,090.28 ns 3,445,250.6 ns 1.67 0.02 191.4063 191.4063 191.4063 799508 B 0.27
DeserializeAsync_OLD 10000 2,672,932.3 ns 22,687.86 ns 20,112.20 ns 2,695,691.0 ns 1.31 0.01 214.8438 136.7188 - 2720215 B 0.93
DeserializeAsync_NEW 10000 2,669,795.2 ns 26,488.64 ns 24,777.49 ns 2,710,116.7 ns 1.31 0.01 214.8438 136.7188 - 2720127 B 0.93

Nice 😬

jodydonetti commented 2 months ago

Update 2: ServiceStack adapter also updated, here are the results:

Method Size Mean Error StdDev P95 Ratio RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
SerializeAsync_OLD 1 621.1 ns 6.60 ns 6.17 ns 632.0 ns 1.00 0.00 0.2861 0.0019 - 3592 B 1.00
SerializeAsync_NEW 1 850.8 ns 8.05 ns 7.53 ns 861.8 ns 1.37 0.02 0.2842 0.0019 - 3576 B 1.00
DeserializeAsync_OLD 1 1,540.6 ns 23.28 ns 20.64 ns 1,569.5 ns 2.48 0.03 0.0782 - - 984 B 0.27
DeserializeAsync_NEW 1 1,911.1 ns 37.17 ns 44.25 ns 1,986.6 ns 3.06 0.08 0.0954 - - 1216 B 0.34
SerializeAsync_OLD 100 51,197.4 ns 455.20 ns 403.53 ns 51,873.0 ns 1.00 0.00 10.3760 - - 130547 B 1.00
SerializeAsync_NEW 100 49,498.9 ns 412.62 ns 385.97 ns 50,067.9 ns 0.97 0.01 7.8735 0.1831 - 98945 B 0.76
DeserializeAsync_OLD 100 131,764.1 ns 1,503.61 ns 1,406.48 ns 133,726.0 ns 2.58 0.03 5.3711 - - 68616 B 0.53
DeserializeAsync_NEW 100 127,933.6 ns 783.59 ns 694.63 ns 128,809.1 ns 2.50 0.02 4.1504 0.2441 - 54015 B 0.41
SerializeAsync_OLD 10000 9,963,801.6 ns 198,926.84 ns 499,068.55 ns 10,867,628.3 ns 1.00 0.00 1234.3750 609.3750 609.3750 13542422 B 1.00
SerializeAsync_NEW 10000 6,602,612.4 ns 135,295.21 ns 396,797.43 ns 7,320,581.7 ns 0.67 0.05 898.4375 281.2500 273.4375 9348740 B 0.69
DeserializeAsync_OLD 10000 16,440,590.5 ns 319,405.91 ns 367,828.32 ns 16,993,865.6 ns 1.67 0.10 562.5000 468.7500 156.2500 6975076 B 0.52
DeserializeAsync_NEW 10000 17,203,673.1 ns 334,163.40 ns 312,576.65 ns 17,678,642.5 ns 1.74 0.08 531.2500 375.0000 125.0000 5477876 B 0.40

Looking good!

viniciusvarzea commented 2 months ago

Hello @jodydonetti, prettty good man. Thank you for considering it. :)

One observation from the library documentation and from my experience:

"Important!: If you do not set MaximumFreeLargePoolBytes and MaximumFreeSmallPoolBytes there is the possibility for unbounded memory growth!"

In my experience, the default options work very well, except for these 2 parameters. Setting it helps to avoid some memory leaks in heavy utilization scenaries.

jodydonetti commented 2 months ago

Thanks for the tip, any suggestion for reasonable values or a rationale to follow?

viniciusvarzea commented 2 months ago

@jodydonetti Since the redis does not works well with large values (i believe that the major part of cache entities are not so big), and the default 'BlockSize' of 128KB and the 'LargeBufferMultiple' of 1MB provided by the library, 16MB is a reasonable value for MaximumFreeSmallPoolBytes, 64MB is a reasonable value for MaximumFreeLargePoolBytes.

Keep in mind that the library does not pre-allocate any buffer, the buffers are allocated as need, so, these limits are used just as upper limits to avoid memmory leaks.

In a future release, we can also improve the DistributedCache performance, compressing the payloads before send it to redis.

jodydonetti commented 2 months ago

Btw I'm reading here and there are other considerations to look out for, for example "Most applications should not call ToArray", with this part in particular: since I am currently calling ToArray() because IDistributedCache requires a byte[], I have to better think about how to handle this.

Furthermore I'm better understanding the whole RecyclableMemoryStreamManager approach, and it seems to me to boil down to 3 core details:

Did I got it right? Because given these constraints it feels like it can be a risky move to use it in this way.

Did I got it wrong? Suggestions?

viniciusvarzea commented 2 months ago

Hi @jodydonetti. Sorry for the delay in responding, I was a little sick.

About this: "Most applications shouldn't call ToArray", it's the way to go as long as you use stream up until the point of sending the data to redis. Due to the use of the IDistributedCache interface (which only has byte[] overloads) I think we cannot use streams in all sending methods, so at some point the ToArray() method must be called, but as the tests show benchmark, there are some benefits (allocation issue) even calling ToArray( ).

About this: "but when using it without correctly configuring some options, it will probably cause a memory leak. but..."

Since redis itself doesn't handle larger values very well, we can provide the default options that fit most scenarios and expose the user to the option to tune it or disable it at all.