dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.29k stars 4.74k forks source link

Consider expanding number formatting cache to larger values #76888

Closed stephentoub closed 1 year ago

stephentoub commented 2 years ago

Several releases ago, we added to number formatting a cache of strings for the values 0 through 9: https://github.com/dotnet/runtime/blob/20d4e309ec4f081e4b1d47a8f937abfbb763a4d4/src/libraries/System.Private.CoreLib/src/System/Number.Formatting.cs#L269 Since with the default format non-negative values aren't impacted by culture, this cache is used for all numerical types for all single digit values as part of value.ToString(). We should consider expanding the cache to larger values, e.g. up to 300 to account for other frequently used values on success paths, like HTTP success status codes.

The upside to this is significantly less allocation as well as better throughput for formatting such values. The primary downside is increased memory costs for a larger table, e.g. caching all values 0 through 299 would add ~12K in allocated objects (caching all through 1000 would add ~39K). If the full cost is deemed prohibitive, it could be made lazy, such that at the expense of an additional null check on each access, we could populate only the strings actually used. We could also clear the table under certain conditions if it was deemed an issue. And we could choose to not include the table in corelib builds where we expect memory to be at a premium.

cc: @marklio

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-runtime See info in area-owners.md if you want to be subscribed.

Issue Details
Several releases ago, we added to number formatting a cache of strings for the values 0 through 9: https://github.com/dotnet/runtime/blob/20d4e309ec4f081e4b1d47a8f937abfbb763a4d4/src/libraries/System.Private.CoreLib/src/System/Number.Formatting.cs#L269 Since with the default format non-negative values aren't impacted by culture, this cache is used for all numerical types for all single digit values as part of `value.ToString()`. We should consider expanding the cache to larger values, e.g. up to 300 to account for other frequently used values on success paths, like HTTP success status codes. The upside to this is significantly less allocation as well as better throughput for formatting such values. The primary downside is increased memory costs for a larger table, e.g. caching all values 0 through 299 would add ~12K in allocated objects (caching all through 1000 would add ~39K). If the full cost is deemed prohibitive, it could be made lazy, such that at the expense of an additional null check on each access, we could populate only the strings actually used. We could also clear the table under certain conditions if it was deemed an issue. And we could choose to not include the table in corelib builds where we expect memory to be at a premium. cc: @marklio
Author: stephentoub
Assignees: -
Labels: `area-System.Runtime`, `tenet-performance`
Milestone: 8.0.0
jkotas commented 2 years ago

caching all through 1000 would add ~39K

This math looks off. You need an array holding the string objects (1000 pointer size) = 80kB on 64-bit. And you need to account for minimal object size (1000 24) = 240kB on 64-bit CoreCLR. The total size of this cache would be 320kB.

If the full cost is deemed prohibitive, it could be made lazy

Yes, I think it should be definitely lazy.

stephentoub commented 2 years ago

(1000 * pointer size) = 80kB

1000 * 8 bytes is 8,000 bytes, not 80,000 bytes. Have I misunderstood?

This wasn't actually based on math but rather on:

while (true)
{
    long m = GC.GetAllocatedBytesForCurrentThread();
    string[] values = new string[1000];
    for (int i = 0; i < values.Length; i++) values[i] = i.ToString();
    Console.WriteLine((GC.GetAllocatedBytesForCurrentThread() - m) / 1024);
    GC.KeepAlive(values);
}
jkotas commented 2 years ago

You are right, I got an extra zero somewhere. It is ~39k as you have said.

stephentoub commented 2 years ago

I think it would end up looking something like this: https://github.com/dotnet/runtime/compare/main...stephentoub:runtime:smallnumberscache

[HideColumns("Job", "Error", "StdDev", "RatioSD")]
[MemoryDiagnoser(displayGenColumns: false)]
public class Bytes
{
    [Params(123)]
    public byte Value { get; set; }

    [Benchmark]
    public string ValueToString() => Value.ToString();
}

[HideColumns("Job", "Error", "StdDev", "RatioSD")]
[MemoryDiagnoser(displayGenColumns: false)]
public class Integers
{
    [Params(1, 12, 123, 1234, 12345)]
    public int Value { get; set; }

    [Benchmark]
    public string ValueToString() => Value.ToString();
}
Type Method Toolchain Value Mean Median Ratio Allocated Alloc Ratio
Integers ValueToString \main\corerun.exe 1 1.393 ns 1.392 ns 1.00 - NA
Integers ValueToString \pr\corerun.exe 1 2.344 ns 2.335 ns 1.68 - NA
Integers ValueToString \main\corerun.exe 12 10.003 ns 9.952 ns 1.00 32 B 1.00
Integers ValueToString \pr\corerun.exe 12 2.351 ns 2.318 ns 0.24 - 0.00
Bytes ValueToString \main\corerun.exe 123 11.340 ns 11.200 ns 1.00 32 B 1.00
Bytes ValueToString \pr\corerun.exe 123 1.183 ns 1.185 ns 0.10 - 0.00
Integers ValueToString \main\corerun.exe 123 13.434 ns 11.441 ns 1.00 32 B 1.00
Integers ValueToString \pr\corerun.exe 123 2.191 ns 2.181 ns 0.11 - 0.00
Integers ValueToString \main\corerun.exe 1234 11.536 ns 11.439 ns 1.00 32 B 1.00
Integers ValueToString \pr\corerun.exe 1234 10.444 ns 10.339 ns 0.91 32 B 1.00
Integers ValueToString \main\corerun.exe 12345 11.341 ns 11.260 ns 1.00 32 B 1.00
Integers ValueToString \pr\corerun.exe 12345 11.300 ns 11.150 ns 1.00 32 B 1.00
EgorBo commented 2 years ago

👍 image

from a memory dump of bingsnr (string objects - number of them in the current snapshot). But I can't say if they come from this formatting.

stephentoub commented 2 years ago

@EgorBo shared with me the dump he was looking at, and I did some querying of it using clrmd.

It has 8.7M strings, of which 177K are parseable with long.TryParse(s, NumberStyles.None, CultureInfo.InvariantCulture, out long value), and of those, ~25% are for values < 300. Of the 5.2MB consumed by all the number strings, ~1.2MB are consumed by these strings for values <300.

If I increase the threshold from 300 to 1000, that ~25% goes up to ~40%.

geeknoid commented 2 years ago

Can the size of the cache be determined at startup based on some environmental state?

Clearly, the cache memory is a trivial cost for a service application so in that environment it is warranted to make it large. But for constrained environments, the extra memory may be problematic.