dotnet / csharplang

The official repo for the design of the C# programming language
11.49k stars 1.02k forks source link

[Proposal]: u8 string interpolation #7072

Open stephentoub opened 1 year ago

stephentoub commented 1 year ago

u8 interpolated strings

Summary

Allow u8 strings to be used with interpolation when the target handler has an AppendLiteral(ReadOnlySpan<byte>) method.

Motivation

In .NET 6, the MemoryExtensions class added a TryWrite extension method that supports interpolating directly into a user-provided Span<char>:

Span<char> destination = ...;
bool formatted = destination.TryWrite($"Date: {DateTime.UtcNow:r}\r\n", out int charsWritten);

The literal portions of the string are directly copied, and the implementation recognizes the ISpanFormattable interface on the formatted values and prefers to use it in order to ask each value to write itself directly into the destination buffer.

In .NET 8, dotnet/runtime plans to add a similar MemoryExtensions.TryWriteUtf8 method that can interpolate into a Span<byte>:

Span<byte> destination = ...;
bool formatted = destination.TryWriteUtf8($"Date: {DateTime.UtcNow:r}\r\n", out int bytesWritten);

This implementation recognizes the new IUtf8SpanFormattable interface on the formatted values and prefers to use it in order to ask each value to write itself directly into the destination buffer.

However, as things currently stand, the string literal portions will end up being passed to AppendLiteral(string) calls, and the handler will need to UTF8 encode each at run-time, even though the data is known at compile-time. This can be avoided if the compiler encodes each literal at compile-time as it does for u8 literals:

Span<byte> destination = ...;
bool formatted = destination.TryWriteUtf8($"Date: {DateTime.UtcNow:r}\r\n"u8, out int bytesWritten);

The above would be lowered exactly as it would without the u8, except each literal would result in a call to AppendLiteral(ReadOnlySpan<byte>), with u8 suffixed onto the literal:

Span<byte> destination = ...;
var handler = new MemoryExtensions.TryWriteUtf8InterpolatedStringHandler(8, 1, destination, out bool shouldAppend);
_ = shouldAppend &&
    handler.AppendLiteral("Date: "u8) &&
    handler.AppendFormatted(DateTime.UtcNow, "r") &&
    handler.AppendLiteral("\r\n"u8);
bool formatted = MemoryExtensions.TryWriteUtf8(destination, ref handler, out int bytesWritten);

Detailed design

We allow the u8 suffix on interpolated strings when targeting a handler. When supplied, anywhere the implementation would have looked for an AppendLiteral(string) method on the handler in order to consider the handler valid and emit calls for the literal portions of the string, it instead looks for AppendLiteral(ReadOnlySpan<byte>).

The C# 10 proposal for interpolated string handlers states "If there are any interpolated_regular_string_character components in i: Member lookup on T with the name AppendLiteral is performed. The resulting method group is called Ml. The argument list Al is constructed with one value parameter of type string." If the u8 suffix is employed, this would be changed to be "one value parameter of type ReadOnlySpan<byte>". The arguments to that AppendLiteral method would be created as if u8 had been appended to that individual literal.

With a string literal, the handler's ctor is passed the length of the literal portion in chars. With a u8 literal, the length would be in bytes (the sum of the lengths of all of the ReadOnlySpan<byte>s).

Drawbacks

Alternatives

Unresolved questions

Design meetings

333fred commented 1 year ago

Probably worth explicitly calling out that this would only work for u8 strings that are being converted to a handler type, as there aren't currently plans to create a default handler type for them and that would require another spec change (what method is called to realize the byte array).

stephentoub commented 1 year ago

Probably worth explicitly calling out that this would only work for u8 strings that are being converted to a handler type, as there aren't currently plans to create a default handler type for them and that would require another spec change (what method is called to realize the byte array).

Right. And we won't be adding AppendLiteral(ReadOnlySpan<byte>) to the DefaultInterpolatedStringHandler. (As an implementation detail, I imagine if we did it could "just work", albeit less efficiently than if you hadn't used u8.)

333fred commented 1 year ago

As an implementation detail, I imagine if we did it could "just work", albeit less efficiently than if you hadn't used u8.

It wouldn't just work since the result of the u8 string is not System.String, but that's all you can get out of the handler. That's the explicit callout I'm looking for.

stephentoub commented 1 year ago

that's all you can get out of the handler

I know.

I think we're talking past each other. But it doesn't matter as without a new AppendLiteral overload on the default handler, it won't be relevant anyway.