Cysharp / Utf8StringInterpolation

Successor of ZString; UTF8 based zero allocation high-peformance String Interpolation and StringBuilder.
MIT License
157 stars 9 forks source link

Add option for zero-terminated string #18

Closed ZimM-LostPolygon closed 1 month ago

ZimM-LostPolygon commented 1 month ago

First of all, thank you for the fantastic library! The idea and the performance are great.

I'm having an issue when utilizing this library to generate UTF8 strings that are consumed by native C methods expecting a zero-terminated string. Utf8StringWriter doesn't add a zero terminator to the end, and there's no place to add it manually when using Utf8StringWriter as a function argument.

Here's an example of the code I'm trying to achieve.

private static unsafe void ProcessData(
    ref Utf8StringWriter<ArrayBufferWriter<byte>> label, 
    ref Utf8StringWriter<ArrayBufferWriter<byte>> data
) {
    label.Flush();
    ReadOnlySpan<byte> labelSpan = label.GetBufferWriter().WrittenSpan;

    data.Flush();
    ReadOnlySpan<byte> dataSpan = data.GetBufferWriter().WrittenSpan;

    fixed (byte* labelBytes = &labelSpan.GetPinnableReference())
        fixed (byte* dataBytes = &dataSpan.GetPinnableReference())
            NativeMethods.ProcessData(labelBytes, dataBytes);
}
neuecc commented 1 month ago

AppendUtf8 means writer war bytes. So you can write like this.

label.AppendUtf8([0]);
label.Flush();
ZimM-LostPolygon commented 1 month ago

That does work, but I'm concerned about performance a bit... Calling AppendUtf8 is a lot of work to just add a null-terminator, as it needs to potentially resize the buffer, copy a span to the destination span, update the destination span... etc. all to add a 0 at the very end. It could be alleviated if the buffer could simply always be +1 length to guarantee there's space for a zero terminator at the end. Then I could just write a 0 at the last position of the final buffer and that's it.

neuecc commented 1 month ago

The size of Utf8StringWriter is not exact. var initialSize = literalLength + (formattedCount * GuessedLengthPerHole); The length used for Hole is unknown, so it's just an estimate. It might be larger or smaller. Therefore, practically speaking, there's likely no difference in adding 1. So we can consider that there are no performance concerns.