Closed justinvp closed 2 years ago
Do the additional FormattableStringFactory members require Roslyn changes to be consumed with the nice syntax?
I think the parameters in the new overloads should have exactly the same names as in the old overloads, so that if someone called string.Format()
with named parameters, their code also starts using the new overloads just by recompiling.
Do the additional FormattableStringFactory members require Roslyn changes to be consumed with the nice syntax?
I need to double-check, but I'm pretty sure Roslyn is already designed to make use of such overloads to String.Format
and FormattableStringFactory.Create
, if they exist. So, no Roslyn changes would be required.
I think the parameters in the new overloads should have exactly the same names as in the old overloads
The 0-based parameter names seem a little strange, but now that you mention keeping the naming the same as the old overloads, the format items in the format string are 0-based, so it makes sense that the parameter names match that. In that case, I'm guessing we'd want the naming of the generic type parameters to match the naming of the arguments:
So instead of this:
public static string Format<T>(string format, T arg);
public static string Format<T1, T2>(string format, T1 arg1, T2 arg2);
public static string Format<T1, T2, T3>(string format, T1 arg1, T2 arg2, T3 arg3);
This:
public static string Format<T0>(string format, T0 arg0);
public static string Format<T0, T1>(string format, T0 arg0, T1 arg1);
public static string Format<T0, T1, T2>(string format, T0 arg0, T1 arg1, T2 arg2);
Or maybe even TArgument0
, TArgument1
, and TArgument2
vs. T0
, T1
, and T2
.
We did talk about the generic overloads on FormattableStringFactory as part of the review but we didn't believe we had enough runway to get the compiler to actually use those in this product release. (cc @gafter).
As for the string.Format overloads we did end up doing some work this release to make the arg0/arg1/arg2 overloads actually avoid the params array allocation internally (yeah until recently we didn't actually even optimize that even though we had the overloads). While we did do that one thing we realized is that string.Format is extremely bad about allocations internally and we don't believe that even eliminating the params array allocation ended up improving performance as much as we would have liked it to. I'm primarily saying that because I have to wonder if a few boxing allocations is really going to have much of an impact when what is really needed is a much larger redoing of the underlying formatting code (which I'm not sure is even possible in a compatible way). Do you have any data about whether or not eliminating these boxing allocations will end up helping much? By the way this doesn't mean these aren't worth adding I'm just trying to collect some more data.
The C# and VB compilers will use appropriate overloads on FormattableStringFactory
if those exist. What we didn't have enough runway for was writing those overloads.
My experience is that the inner workings of string.Format (actually StringBuilder.AppendFormat) is quite efficient. The allocations are what you'd expect. Most of the time we get a hit in StringBuilderCache, so there's no char[] allocation. Allocations in this area usually show up in serialization and logging routines. Serialization usually arrives via TextWriter.Write (which calls string.Format). Removing the params array in 4.6 was nice. Roslyn analyzers can help locate and eliminate boxing.
It would be interesting to see if one could implement this, avoiding boxing but at the same time avoiding stamping out 3 new generic versions of StringBuilder.AppendFormat.
Roslyn analyzers can help locate and eliminate boxing.
There's already an analyser that does this. The tricky part is keeping it up to date with the changes, for instance it currently doesn't handle the recent changes that got rid of unnecessary boxing in String.Concat. Probably the same applies for the change that removed params array allocations, that has been mentioned elsewhere in this thread.
The robust way to do this would be for it to look at the IL produced, but that's going to be expensive for an Analyser (although I did give it a go and Roslyn combined with Cecil makes it nice and easy).
Do you have any data about whether or not eliminating these boxing allocations will end up helping much?
@weshaggard, I have some data in the conclusion below, but first some background...
Consider the following example:
DateTime now = DateTime.UtcNow;
Guid id = Guid.NewGuid();
decimal value = 3.50m;
string result = string.Format("{0:s} - {1:B}: The value is: {2:C2}", now, id, value);
Let's track of the allocations with the current implementation:
now
parameter is boxed.id
parameter is boxed.value
parameter is boxed.StringBuilder
is allocated to build-up the "s"
(in "{0:s}"
) format string."s"
string
is allocated.now.ToString("s", null)
is allocated.StringBuilder
is allocated to build-up the "B"
(in "{1:B}"
) format string."B"
string
is allocated.id.ToString("B", null)
is allocated.StringBuilder
is allocated to build-up the "C2"
(in "{2:C2}"
) format string."C2"
string
is allocated.value.ToString("C2", null)
is allocated.string
is allocated.13 allocations!
Note that there's also a StringBuilder
instance used for the overall resulting string that I left out of the allocation count as this allocation is mitigated by the use of the internal StringBuilderCache
.
These allocations can be reduced significantly!
String.Format
(StringBuilder.AppendFormat
) implementation that would avoid the StringBuilder
allocations for the argument format strings, using string.Substring
for the common case and falling back to a single StringBuilder
instance only when necessary (more on this below), which would reduce the allocations by another 3, down to 7 allocations.IFormattable
(even if just used internally for the built-in BCL types) could reduce this by another 6 allocations, down to just 1 allocation!Down to just 1 allocation for the resulting string!
Avoids the boxing allocations. Pretty straightforward.
StringBuilder
allocations)Right now with the current implementation, a new StringBuilder
is allocated for each argument that has a format string associated with it (e.g. the "C2"
in "{0:C2}"
). This can be trivially improved by simply reusing the StringBuilder
instance for multiple args. It involves just moving the StringBuilder
variable to the top of the loop, still lazily allocating it, clearing it when finished, and reusing it for the next argument that needs it. However, even better, in the common case when there aren't any escaped braces in the argument's format string (e.g. "{{"
and "}}"
) then a StringBuilder
isn't needed at all; the more light-weight String.Substring
can be used.
IFormattable
alternativeOptimizing String.Format
(StringBuilder.AppendFormat
) is ultimately limited by the design of IFormattable
, and to a lesser extent the less commonly used ICustomFormatter
.
public interface IFormattable
{
string ToString(string format, IFormatProvider formatProvider);
}
This design is perfectly reasonable, but there are a couple issues that prevent optimizations inside String.Format
(StringBuilder.AppendFormat
):
format
parameter is typed as string
, which means an instance of string
must be allocated for format
, as opposed to accepting a StringSpan
.string
, which means an instance of string
must be allocated for the result, as opposed to appending the resulting characters to a destination StringBuilder
.An alternative design might look something like the following:
public interface IFormattableTo // (no pun intended, heh)
{
void FormatTo(StringSpan format, IFormatProvider formatProvider, StringBuilder destination);
}
It may not be worth exposing such an interface publicly (at least for now) as it would basically be a replacement for IFormattable
and it requires the StringSpan
or Span<char>
type, which hasn't landed yet. But it could be included as internal for now and implemented on all the built-in BCL types that currently implement IFormattable
for improved String.Format
perf for these very commonly used types.
Shameless plug of my own work: https://github.com/MikePopoloski/StringFormatter Might be useful to mine for ideas.
I too needed string formatting without allocations. It'd be nice to take advantage of some of that dedicated code built into the CLR to make it faster, but it does alright on its own as pure C#.
@terrajobst, Immo, this is an API suggestion. Could this be added to the agenda for an API review? Even if there's some opposition to the String
and StringBuilder
overloads, I think the FormattableStringFactory
overloads are a no-brainer (would have been added for 4.6, but ran out of time).
Hi Justin, thanks for the explanation ans benchmarks.
how does one include the above overloads. i mean defining a new class "System" throws error - no body implementation for the overload methods or they arent defined as abstract, partial, extern etc.
I know it's been a while, but we'd love to see this over at Stack Overflow - @terrajobst can you please ping here when the API review is up next so we can listen in?
@justinvp
Could this be added to the agenda for an API review?
Generally yes, I believe with have everything we need. However, we're currently focusing on getting RTM ready, which means we're focusing our reviews on exposing APIs that already exist in .NET Framework to simplify porting as well as adding new APIs that are required to unblock other components and tools. That's why it's currently in the Future
milestone. I hope that we'll be able to move some of these to a post-RTM milestone.
We could get the maximum .ToString()
of most of core types. Eg .MaxValue.ToString.Length + 1
ApproxMaxCharLength<T>( obj : T ) : Int
{
return match( typeof( T ) ) with
{
| string -> ((string)obj).Length;
| bool -> 5; // false
| byte -> 3; // 256
| sbyte -> 4; // -128
| char -> 1; //
| uint16 -> 6; // 65535
| int16 -> 6; // -32768
| uint32 -> 11; // -2147483648
| int32 -> 10; // 4294967295
| uint64 -> 20; // 18446744073709551615
| int64 -> 20; // –9223372036854775808
| ... -> 0;
// not done the floating point ones.
}
}
An initial first approximation of the string length, could be calculated thus.
var intialSize = args.Sum( ApproxMaxCharLength ) + TextOfFormatString.Length;
I think this would be an over allocation. Note this is excluding any variation caused by the culture and additional alignment and format specifiers on the arg hole eg { x , -16 : X4 }
At which point (not done any deep analysis) would the extra processing to get a good estimate of the size of string to allocate, outweighs the processing cost of just blindly doing the building of the string?
@AdamSpeight2008 This issue is mainly about the allocations due to boxing.
The APIs look fine, however, the only way we could benefit from this work is if our implementation avoids the boxing all the way down, which would be non-trivial amount of work. Also, it's questionable without a benchmark how big the savings are, because formatting will probably cause a bunch of allocations anyway.
It might be better to leave these APIs as-is and instead use the new formatting APIs that @KrzysztofCwalina is working on.
@KrzysztofCwalina, what are your thoughts on this?
Remember that the benefits here are extreme because just by recompiling, you get them. That huge, free adoption should be factored into the effort costing. This helps all code, rather than just new code. I don't see string.Format()
or $""
going anywhere for a long time.
Perhaps I'm missing something though - doesn't $""
continue to map to string.Format
in future plans as well, or is that being re-pointed to some new API? That'd be a pretty extreme breaking change, but worth asking...
$"..."
always maps to an invocation of string.Format
(if the target type is string
).
I like the improvement in-place for the reson @NickCraver spells out. I think we should add these APIs. The only concern I would have is that it might cause Generics instantiation explosion that runtime people are really concerned about lately. @jkotas?
BTW, the new formatting APIs @terrajobst mentioned are described at https://github.com/dotnet/corefxlab/wiki/System.Text.Formatting. But I think an in-place addition will be complementary.
This should be data-driven decision. It is unclear to me whether saving extra allocations is worth it to justify the extra code in this case. String.Format will always allocate and it will always be relatively slow API. It is unclear to me whether changing it to allocate a bit less garbage at the cost of pile of extra code is going to measurable improve anything important.
The trouble is string interpolation is too good a feature not to use, even for those of use that normally try to avoid allocations at all costs... (though normally used in lesser taken paths like exception handling)
Agree. In the above @justinvp example, there 13 allocations. 9 of them are internal. It should be possible to eliminate all internal ones without changing the shape of the API, so that should be done first.
Next step: Improve current memory allocations of existing APIs as much as possible -- the result will tell us if we still need new APIs.
How would we ever eliminate the boxing here without the API additions? I agree with internal eliminations being the first step, but if we're after absolute performance, I don't see a way around the second step.
we're after absolute performance, I don't see a way around the second step.
The no-allocation formatters being developed in corefxlab are going after absolute performance.
String.Format
will always allocate the string. The string is likely short-lived garbage - it is unlikely that it is stored for long. I do not think that the difference between allocating a string vs. allocating a string and a box for each argument is going to be that significant.
(Aside regarding the original proposed API rather than the current discussion, would public static string Format<T>(string format, params T[] args);
be useful or overkill? It's not rare to have to output a bunch of integers or a bunch of dates or otherwise format several items of the same type).
@jkotas I agree on String.Format()
, but our much more common use case (localization in views, etc.) is on the StringBuilder.AppendFormat()
side of the fence, which has more to win here overall. An example of this is in ASP.NET views, especially when we're doing localization.
Either way, we need to measure. The trouble is measuring this accurately means really doing it for comparison...to see if it's worth doing. Vicious circle we have here :)
The trouble is measuring this accurately means really doing it for comparison
Right, and doing it is non-trivial amount of work because of the current formatting implementation has heavy dependency on boxing.
The new Format(Span<char> span, out int length)
family of functions is what makes the fastest implementation possible, using zero alloc techniques.
Rewrite $"prefix{x}postfix"
to
var resultBuffer = stackalloc new char[<just enough for the value types in question> plus the string constant];
var buffer = resultBuffer;
int length;
"prefix".CopyTo(buffer);
buffer = buffer.Slice("prefix".Length);
arg1.Format(buffer, out length);
buffer = buffer.Slice(length);
"postfix".CopyTo(buffer);
resultBuffer.Overlaps(buffer, out var elementOffset);
return new string(resultBuffer.Slice(0, elementOffset));
I understand that this would totally break compatibility, so this could be an extension to a future C# version, like:
$ <interceptor expression> "{x}"
The idea is to specify a storage for the resulting output. Anything that can be possibly used to store a string result would be a good fit for the interceptor
.
$ stackalloc char[] "{x}"
$ stackalloc byte[] "{x}" // for utf8
// ...
$ new StringBuilder(100) "{x}"
$ existingStringBuilder "{x}"
Another possibility to avoid boxing would be to call ToString() on each of the arguments in-place, and also provide an overload which takes a span of strings instead of array of objects.
Not sure why is this one flagged for 5.0, @jkotas @terrajobst I suppose that this one we probably want to push to future instead right?
Did we ever make progress on this? Did we remove the 9 internal allocations here? I was looking around at issues for improving string.Format
and this one keeps popping up in my searches. I'm also wondering why there isn't a method that "compiles" the template (sorta like a compiled regex):
FormattableString<string> formatter = FormattableString.Create<string>("Hello {name}");
formatter.Format("Joe");
formatter.Format("John");
formatter.Format("Paul");
@stephentoub can correct me, but AFAIK we solved a related problem in the BCL a while ago to take advantage of specialized params overloads for a fixed number of arguments to avoid allocating the array. Not sure we made any progress for avoiding allocations due to boxing of arguments. If we do this, we likely want to support FormattableString. Depending on how Roslyn binds to the factory, this might or not might not need compiler changes.
I'm also wondering why there isn't a method that "compiles" the template (sorta like a compiled regex)
I don't think the use case ever came up. For 99% of cases, callers have the format as a string and the arguments. Not saying it's wrong, but it's certainly not very convenient.
What I proposed (https://github.com/dotnet/runtime/issues/28945), and what @jaredpar's been running with as a potential C# 10 feature (https://github.com/dotnet/csharplang/issues/2302), is the idea of a target-typable custom builder, with the C# compiler generating Append calls for each string segment and each hole. For example, if you had a method:
static void Foo(MyCustomBuilder builder);
and someone called that with:
ReadOnlySpan<char> name = ...;
int age = ...;
Foo($"Hi. My name is {name}, and my age is {age}.");
the compiler would translate that into something along the lines of:
var b = MyCustomBuilder.Create();
b.Append("Hi. My name is ");
b.Append(name);
b.Append(", and my age is ");
b.Append(age);
b.Append(".");
Foo(b);
MyCustomBuilder would at a minimum expose an Append overload for object
, but it could also expose overloads for for string
, for int
, for ReadOnlySpan<char>
, for T
, and for whatever other types it wants to optimize for.
I have a proposed for a considerably faster solution to string formatting, which avoids boxing of arguments in the common case and only performs a single allocation for the final string. It's coming at around 2x faster than String.Format or string interpolation in the common case, plus since it doesn't create temporary garbage it's nicer on the GC in the long run.
See https://github.com/geeknoid/FastFormatting.
If folks are interested, I'd be happy to submit a PR and get this integrated in the system. The code that's there now is mostly feature complete, but it lacks full testing and docs. I'll address these shortcomings in the next few days.
PS: @stephentoub, you might recognize the concept from CompositeFormatString in that previous project we worked on years ago :-)
@geeknoid, what are the advantages of that over my cited proposal? From a usability perspective, I'd prefer being able to write Foo($"Hi. My name is {name}, and my age is {age}.")
rather than private static readonly StringFormatter s_nameFormat = new StringFormatter("Hi. My name is {0}, and my age is {1}."); ...; Foo(s_nameFormat.Format(name, age));
, and I don't believe the proposed StringFormatter works well with spans, which can't be used as generic arguments. The cited proposal also similarly avoids all boxing (when the builder exposes an Append<T>
or an Append method specific to the type), only allocates for the final string (if a string is even needed... a consumer could just access a span from the builder), etc., plus it not only precomputes the holes, but does so at compile time.
How do format specifiers work in your proposal @stephentoub ?
How do format specifiers work in your proposal @stephentoub ?
We'd define a pattern for Append methods, where the first argument is the thing to be formatted, if there's a second it's a format string, etc.
I've now added support for a Span-based TryFormat version which results in no allocations. This version is over 4x as fast as String.Format and induces no garbage. After 20 years, there's finally a formatting function in .NET that can compare in perf with C's sprintf :-)
Here are the benchmark results
| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---------------------------- |---------:|---------:|---------:|----------:|----------:|---------:|-----------:|
| TestClassicStringFormat | 46.02 ms | 0.865 ms | 1.515 ms | 2909.0909 | 1181.8182 | 363.6364 | 16800057 B |
| TestInterpolation | 47.91 ms | 0.943 ms | 1.523 ms | 3000.0000 | 1250.0000 | 416.6667 | 16800523 B |
| TestStringFormatter | 25.59 ms | 0.508 ms | 0.903 ms | 2187.5000 | 906.2500 | 281.2500 | 12000025 B |
| TestStringFormatterWithSpan | 10.86 ms | 0.182 ms | 0.224 ms | - | - | - | 4 B |
@stephentoub The benefits to my proposal is that:
I think the C# compiler can be taught about this type and magically generate the static StringFormatter instances for any use of string interpolation. Thus, existing code would just suddenly get faster by recompiling. Additionally, I think there could be a Roslyn code fixer which offers to convert any use of String.Format with a static format string into an interpolated string. Those cases would also then magically benefit from improved perf.
Also I think my implementation for a non-boxing version of the ParamsArray type could be used to eliminate boxing in existing APIs like String.Format, making those a touch faster too.
@geeknoid can you share benchmark code or link it please? While this is kinda off topic but I feel that there might be a way to make faster F# printf with things you're proposing.
@En3Tho looks like it's at https://github.com/geeknoid/FastFormatting
The benefits to my proposal is that
Thanks.
I believe it is both smaller IL and faster than what you propose.
I expect it would be smaller IL. I'm skeptical it'd be meaningfully faster for creating strings. And it would almost certainly result in more asm, in particular as every unique combination of arguments involving a value type would result in generic specialization for all of the generic pieces, and that's likely to be non-trivial; in contrast, with the builder approach, you only end up worst case with a generic-specialization per type formatted rather than per combination of types formatted.
The implementation exposes a span-based version which completely eliminates all allocations.
I agree the TryFormat aspect of it is nice. With the builder approach, you can get that as well, but it involves a copy.
In both approaches, it's also hard to claim all allocations are completely eliminated, as the moment you need to utilize IFormattable, you're likely incurring string allocations for each argument being formatted, only not when the instance is able to utilize a cached string.
I also don't see how the proposed approach supports arbitrary numbers of arguments without allocation. At some point you need one more argument than can be supported by the explicit strongly-typed overloads provided, and you end up with the params object[] overload. One of the nice things about the builder approach is it supports any number of arguments strongly-typed, as each argument is formatted independently with a single Append call.
It doesn't require any significant new patterns across types
This also hints at one of its biggest flaws IMO, but maybe you have a good solution for this. I think anything new we do in this space must support formatting ReadOnlySpan<char>
as arguments. One of the nice things about the builder approach is this is easily supported just by having an Append(ReadOnlySpan<char>)
overload on the builder. I don't see how this is easily supported in your proposal, as spans can't currently be used as generic arguments. How would you propose to support span being used as one or more of the arguments?
Also I think my implementation for a non-boxing version of the ParamsArray type could be used to eliminate boxing in existing APIs like String.Format, making those a touch faster too.
With the existing overloads? I don't see how, since the boxing is inherent to the existing APIs. It's also inherent to the existing implementation, which relies heavily on being able to pass the arguments around as object. If you're suggesting new generic string.Format overloads like those proposed in the very first post, that would also require plumbing that through the whole formatting implementation; I'd be interested in understanding what that implementation would look like, and similar to my previous comments, what impact that would have on bloating asm size in a typical apps (especially for AOT compilations, Blazor, Xamarin apps, etc., where presumably existing usage would start binding to the new more-strongly-typed overloads).
Yes, you've found all the flaws of the current implementation :-). Some more thoughts:
It's only allocation-free up to 3 arguments. This could be extended to a larger number, but there's clearly an upper bound. With the way the API is designed, the first 3 arguments are captured in a strongly-typed fashion and the rest are in a params object?[]
I made a specific point of keeping the code that has multiple generics as simple as possible. There is actually very little code in that regime, the generic code calls out ASAP to non-generic functions when possible. I learned my lesson about the impact of multi-generic types when we tried to implement a 4-parameter finger tree years ago. It was enormous! Anyway, I think the amount of code generated is within reason.
Of course, if the implementation calls out to an IFormattable instance, an allocation will happen as a result. There's no getting around that given how IFormattable works. The code deals with all primitive types without boxing however. It would be good to make ISpanFormattable public so more types could implement it and improve formatting efficiency in general.
My design works, today, in any version of .NET that supports Span
Don't underestimate the value of the 2x perf boost you get with TryFormat. This is huge. We have a lot of services that do a lot of string formatting and enabling them to be 0 alloc in the common case would likely lead to measurable COGs savings.
I can't think of a way to support ReadOnlySpan
@stephentoub So what I've done in my stuff is three things:
Extend ValueStringBuilder to support Append methods for the various primitive types. These overloads support argument formatting.
Create the machinery to preparse a composite format strings to a dense form that can be interpreted faster during formatting and provide a formatting loop that leverages this.
Implement a hack to prevent the params array allocation and boxing for the first 3 formatting arguments.
Exposing the extended ValueStringBuilder as a public type would be sufficient to implement the builder pattern you describe. String interpolation would immediately benefit. This could support all the normal formatting features (width, justification, etc)
Exposing ValueStringBuilder would make it possible for developers to implement 0-alloc string concatenation by supplying a span to the constructor of ValueStringBuilder. It's a bit clumsy, but it'll be quite fast.
To let developers do 0 alloc string formatting, then you can add the StringFormatter type with its TryFormat method. Or, you could get fancy and enhance string interpolation with new syntax such as:
Span<char> target = new Span<char>(new char[256]);
target <= $"Hello {name}"
This would let the C# compiler pass the span to the ValueStringBuilder constructor and voila, slick no-alloc string formatting.
That syntax is too vague in this case. Should it return bool and chars written, should it only return chars written and -1 in bad case, or should it completely hide details and just suddenly blow up with our of memory if anything goes wrong? I think that sticking to Method calls is better here. Also, I think the problem here (as whole) is that you need a compiler support to make code as efficient as possible and the way of the builder seems better as it can be easily integrated to support any number of parameters and so on. But at the same time I can't really understand how builder will solve explicit Format calls. Like for example when you programmatically build format and then call the function. @geeknoid while I think that making a specialized builder calls is probably better in case of known constant format and known arguments, maybe your approach could help the second case?
@stephentoub I've updated the design at https://github.com/geeknoid/FastFormatting. From the README:
The StringMaker
type is a supercharged version of the classic StringBuilder
type. It
is designed for efficiently building up a string or span by appending together bits and
pieces. The implementation tries to avoid any memory allocations when it can, and if you
supply your own span to the constructor, it can operate entirely without allocating memory.
When used in this memory, totally performance is around 4x faster than using StringBuilder
.
The StringFormatter
type is built on top of StringMaker
and is designed to support the
normal .NET composite formatting model, such as you use with String.Format
. The type splits
the normal formatting step in two in order to boost performance of the formatting process.
The first step is parsing the composite format string into an efficient form. This step is done
by create a StringFormatter
instance. Once you have this instance, you can use it to format
a set of arguments by calling the Format
method, which is about 2x as fast as String.Format
.
You can also use the TryFormat
method to format directly into your own span, which is 4x faster
than String.Format
.
This new code maintains the same perf characteristics as I had previously. The code has nearly 100% code coverage and compatibility with classic String.Format semantics is explicitly tested for.
I think StringMaker
can be used directly by the C# compiler to implement efficient interpolation with a single allocation for the final string. Some magic C# syntax can also enable 0 allocation by outputting to an existing span. The StringFormatter type on its side is useful as a faster replacement of String.Format for when interpolation is not an option (for localized text for example).
How's that sound?
Latest perf tests. These are all trying to format the same set of inputs and produce the same final string:
| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---------------------------- |----------:|----------:|----------:|----------:|----------:|---------:|-----------:|
| TestClassicStringFormat | 44.866 ms | 0.8916 ms | 1.4897 ms | 3000.0000 | 1250.0000 | 416.6667 | 16800539 B |
| TestInterpolation | 45.272 ms | 0.8490 ms | 1.5524 ms | 3000.0000 | 1250.0000 | 416.6667 | 16800515 B |
| TestStringBuilder | 49.877 ms | 0.9750 ms | 1.3016 ms | 2888.8889 | 1111.1111 | 333.3333 | 16800055 B |
| TestStringFormatter | 25.101 ms | 0.5015 ms | 0.9170 ms | 2187.5000 | 906.2500 | 281.2500 | 12000022 B |
| TestStringFormatterWithSpan | 9.574 ms | 0.1785 ms | 0.1490 ms | - | - | - | 4 B |
| TestStringMaker | 7.308 ms | 0.1203 ms | 0.1564 ms | 2859.3750 | - | - | 11999924 B |
| TestStringMakerWithSpan | 5.043 ms | 0.0988 ms | 0.1832 ms | - | - | - | 2 B |
I think StringMaker can be used directly by the C# compiler to implement efficient interpolation with a single allocation for the final string
Isn't StringMaker essentially just ValueStringBuilder, and basically the same builder approach I was proposing? The only thing that differs is I was suggesting a pattern in addition, but that's just to enable additional scenarios, namely an API that wants to take a particular builder so as to a) have raw access to its underlying buffer and b) have control over the Append methods exposed so as to have more control over the formatting. But we'd have a built-in implementation of it that the compiler could use when targeting string, such that existing string interpolation just go more efficient in its implementation without semantic changes. So, I'm not clear on fundamentally what's different here? (Note that if we do go with such an approach, such a type would likely live in the compiler services namespace, and would be discouraged for general direct use: in general we're wary of exposing structs that wrap pooled resources, even if ref structs, that would make it easy to accidentally copy them and dispose multiple times, double-freeing a buffer.)
The StringMaker type is a supercharged version of the classic StringBuilder type.
As noted above, this seems like it's just ValueStringBuilder. If from an implementation perspective there are things you've done in your implementation that you've found to be more efficient, please feel free to submit a PR to improve ValueStringBuilder's implementation / reduce its overheads.
How's that sound?
I think you're now proposing two separate things:
Am I understanding correctly?
ps I'm on a stay-cation now and will be slow to respond until the new year. Thanks, and happy holidays.
It's pretty common to pass value types to
String.Format
, unfortunately this results in unnecessary boxing allocations.Rationale
Consider the following:
Both the traditional call to
String.Format
and the use of the string interpolation language feature (which is just syntactic sugar forString.Format
) requires 3 boxing allocations.Proposed API
I'd be happy to contribute an implementation and tests.