dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.61k stars 4.56k forks source link

`System.Text.Ascii.Trim`/`TrimStart`/`TrimEnd` methods include `'\0'` character/byte in trimming #104201

Open assumenothing opened 2 weeks ago

assumenothing commented 2 weeks ago

Description

When using the publicly accessible System.Text.Ascii.Trim method (including TrimStart and TrimEnd), characters/bytes with a value of zero (character literal of '\0') will also be trimmed, even though it is not normally considered a white space character.

Reproduction Steps

string testString = "\0string\0";
Range trimRange = System.Text.Ascii.Trim(testString);
Console.WriteLine($"Trim Range = {trimRange}"); // results in the range [1..7], which trims the \0 chars

Expected behavior

It is expected that a string starting or ending with '\0' characters should not be trimmed (to match behavior of other similar APIs like String.Trim and System.MemoryExtensions.Trim).

Actual behavior

The resulting range returned includes '\0' characters to be trimmed.

This is likely due to a mistake in the implementation, which assumes that element values that are less than or equal to 0x20 are not going to become negative when subtracting by one. Or it was derived from an algorithm that was originally designed for a C-like language (which assumes that strings are always terminated with '\0' and will never appear within).

// Problem is that these statements result in identical values when
// used with the implementation's white space TrimMask test:
Console.WriteLine($"1U << (0x00 - 1) = 0x{1U << (0x00 - 1):x}"); // NUL ASCII code
Console.WriteLine($"1U << (0x20 - 1) = 0x{1U << (0x20 - 1):x}"); // Space ASCII code

Regression?

No response

Known Workarounds

Avoid using the Ascii.Trim, Ascii.TrimStart, and Ascii.TrimEnd methods if '\0' characters should not be trimmed.

Configuration

.NET 8.0.6 x64 on WIndows 10 (Console Application)

Other information

The most obvious solution here with the smallest impact is to simply add documentation indicating that the character/byte value zero will be included in the trimming. Otherwise the fix would involve making a breaking change to the API.

stephentoub commented 2 weeks ago

cc: @adamsitnik

dotnet-policy-service[bot] commented 2 weeks ago

Tagging subscribers to this area: @dotnet/area-system-text-encoding See info in area-owners.md if you want to be subscribed.