Closed colgreen closed 1 year ago
You can also use a StringBuilder Pool to minimize GC pressure, check this: https://learn.microsoft.com/en-us/aspnet/core/performance/objectpool?view=aspnetcore-7.0
Obviously lots of testing is required.
Some before and after benchmark numbers for SnowflakeDataMapper.MapTo<CustomClass>
:
Method | Mean | Error | StdDev | Gen0 | Allocated |
---|---|---|---|---|---|
ResponseWithValues_MapTo_CustomClass | 293.6 us | 2.22 us | 1.97 us | 49.8047 | 409.46 KB |
ResponseWithValues_MapTo_CustomClass | 225.6 us | 0.60 us | 0.56 us | 20.2637 | 165.72 KB |
Note the reduced memory allocations per iteration of the test method (from 409.46 KB to 165.72 KB - this is for a rowset with 100 rows).
Converting binary hex to base64 (for a short hex string)
Method | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
---|---|---|---|---|---|---|---|
HexToBytes_Short (before) | 69.98 ns | 0.651 ns | 0.544 ns | 0.0191 | - | - | 160 B |
HexToBase64_Short (after) | 32.19 ns | 0.137 ns | 0.128 ns | - | - | - |
Speed is about 2x fast, and much reduced memory allocations (none reported).
Same test again with 1,000,000 bytes of binary (2,000,000 hex chars):
Method | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
---|---|---|---|---|---|---|---|
HexToBytes_Long (before) | 24,548,512.50 ns | 370,756.387 ns | 346,805.754 ns | 4406.2500 | 593.7500 | 593.7500 | 35667310 B |
HexToBase64_Long (after) | 5,890,138.02 ns | 20,698.849 ns | 19,361.716 ns | 312.5000 | 148.4375 | - | 2684629 B |
4x faster and 1/13th the memory allocations.
@georgebarbu this PR should reduce the memory pressure for your large rowsets, regardless of whether that have binary columns or not.
Also note that I have upgraded System.Text.Json in this PR, as the latest versions 7.x do still publish targets/dlls for .net standard 2.0, so there's no reason to not upgrade for this package that also targets .net standard 2.0.
You can also use a StringBuilder Pool to minimize GC pressure, check this:
https://learn.microsoft.com/en-us/aspnet/core/performance/objectpool?view=aspnetcore-7.0
Obviously lots of testing is required.
Thanks. I have used this now, and it has reduced memory allocations (in addition to the other changes). I was aware of ObjectPool, but wasn't sure if it was still worth it since the internals of StringBuilder have been rewritten to use a linked list of buffers instead of one single char array, but on reflection it definitely still makes sense.
Possible downsides here are a memory 'leak' if the number of objects in the pool becomes large. Although, that would only occur if there was a need for that number of objects simultaneously, so it doesn't greatly affect peak memory pressue. There is also a limit to the capacity of the StringBuilder instances that are accepted into the pool, so on balance I think this approach is reasonable and safe.
I just noticed that the later versions of System.text.Json.JsonSerializer now have synchronous Deserialize methods that accept a UTF8 byte Stream
, so I think we can use Streams now instead of StringBuilders, which will reduce the memory pressure by about half I think (as chars are two bytes, but most chars in common use encode to a single utf8 byte).
If I get chance I'll take another look at this - it can use RecyclableMemoryStream instead of ObjectPool.
If you do choose to merge one of my PRs, then I now recommend merging #40 instead of this PR, as it makes some substantial further reductions to memory allocations.
@colgreen I'm closing this in favor of #40.
Use of StringBuilder to build the JSON strings that are passed to
JsonSerializer.Deserialize<T>()
; this reduces the number of strings allocated when doing string concatenation using the+
operator. This is particularly beneficial when dealing with a large binary hex value, as the intermediate strings from each concatenation can be huge in that scenario.Regarding handling of binary hex, the hex is now encoded to base64 and appended to the StringBuilder in one method call; this avoid an intermediate base64 string (which, again, can be very large).
None of this is perfect, as the final result is still one large JSON string, potentially with very large base64 strings in it, but this PR will reduce the amount of intermediate strings that are allocated.
I have also added a
Snowflake.Client.Benchmarks
project to demonstrate the improved performance of the hex to base64 conversion for large binary values (about 4x faster, and less memory allocations).For information, there is another branch at https://github.com/colgreen/snowflake-db-net-client/tree/deserialization-efficiency-improvements-streams that takes this code further by using a utf8
RecyclableMemoryStream
instead ofStringBuilder
; that means the base64 strings are half the size (because the base64 chars are encoded to single bytes instead of two bytes per character for .NET strings). Ultimately I decided not to create a PR for that code as it meant usingJsonSerializer.DeserializeAsync<T>()
instead ofJsonSerializer.Deserialize<T>()
, and that makes things more complex. Really, to avoid excessive memory allocations when dealing with hex binary I think it would be better to not use JsonSerializer at all, and to use reflection to set the value on the target object directly, but I can see that usingJsonSerializer.Deserialize<T>()
is a very convenient approach that has all the nice features and flexibility of System.Text.Json deserialization, so this PR is basically making that a bit more efficient - in terms of reduced execution time and reduced memory allocations.