dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.22k stars 4.72k forks source link

String.StartsWith( "\0" ) always returns true. #50521

Closed mkb137 closed 3 years ago

mkb137 commented 3 years ago

Description

Expected:

    "x".StartsWith( "\0" ) == false

Actual:

    "x".StartsWith( "\0" ) == true

e.g.:

    Console.WriteLine( $"x starts with nulls? {"x".StartsWith("\0", StringComparison.InvariantCulture)}" );

Also with multiples of null: e.g.:

    Console.WriteLine( $"x starts with nulls? {"x".StartsWith("\0\0\0", StringComparison.InvariantCulture)}" );

Configuration

Basic console app, .net 5.0.

Problem doesn't exist on .netstandard3.0.

Regression?

Code:

using System;

namespace ConsoleApp1 {
    class Program {
        static void Main( string[] args ) {
            Console.WriteLine( $"x starts with nulls? {"x".StartsWith("\0", StringComparison.InvariantCulture)}" );
            Console.WriteLine( $"x starts with nulls? {"x".StartsWith("\0\0\0", StringComparison.InvariantCulture)}" );
            Console.WriteLine( $"\\0 starts with nulls? {"\0".StartsWith("\0", StringComparison.InvariantCulture)}" );
        }
    }
}

.net 5.0:

    <PropertyGroup>
        <OutputType>Exe</OutputType>
        <TargetFrameworks>net5.0</TargetFrameworks>
    </PropertyGroup>

output:

x starts with nulls? True
x starts with nulls? True
\0 starts with nulls? True

.net 3.1:

<Project Sdk="Microsoft.NET.Sdk">
    <PropertyGroup>
        <OutputType>Exe</OutputType>
        <TargetFrameworks>netcoreapp3.1</TargetFrameworks>
    </PropertyGroup>
</Project>

output:

x starts with nulls? False
x starts with nulls? False
\0 starts with nulls? True

Other information

dotnet-issue-labeler[bot] commented 3 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost commented 3 years ago

Tagging subscribers to this area: @tarekgh, @safern See info in area-owners.md if you want to be subscribed.

Issue Details
### Description Expected: ``` "x".StartsWith( "\0" ) == false ``` Actual: ``` "x".StartsWith( "\0" ) == true ``` e.g.: ``` Console.WriteLine( $"x starts with nulls? {"x".StartsWith("\0", StringComparison.InvariantCulture)}" ); ``` Also with multiples of null: e.g.: ``` Console.WriteLine( $"x starts with nulls? {"x".StartsWith("\0\0\0", StringComparison.InvariantCulture)}" ); ``` ### Configuration Basic console app, .net 5.0. Problem doesn't exist on .netstandard3.0. ### Regression? Code: ``` using System; namespace ConsoleApp1 { class Program { static void Main( string[] args ) { Console.WriteLine( $"x starts with nulls? {"x".StartsWith("\0", StringComparison.InvariantCulture)}" ); Console.WriteLine( $"x starts with nulls? {"x".StartsWith("\0\0\0", StringComparison.InvariantCulture)}" ); Console.WriteLine( $"\\0 starts with nulls? {"\0".StartsWith("\0", StringComparison.InvariantCulture)}" ); } } } ``` .net 5.0: ``` Exe net5.0 ``` output: ``` x starts with nulls? True x starts with nulls? True \0 starts with nulls? True ``` .net 3.1: ``` Exe netcoreapp3.1 ``` output: ``` x starts with nulls? False x starts with nulls? False \0 starts with nulls? True ``` ### Other information
Author: mkb137
Assignees: -
Labels: `area-System.Globalization`, `untriaged`
Milestone: -
tarekgh commented 3 years ago

@mkb137 you are calling String.StartsWith and not passing any StringComparison option to the call. That means you are requesting the operation to be done linguistically (i.e. cultural aware). When doing the operation linguistically, the \0 is ignorable character. think about it as if it is not existing in the string at all. You may look at the Unicode standard https://www.unicode.org/charts/collation/chart_Ignored.html to see that the null is ignorable character when you compare.

To get your desired behavior, you need to do it as "x".StartsWith("\0", StringComparison.Ordinal) which will make the operation performed in nonlinguistic way.

In .NET 5.0 we have switched to use ICU library for globalization which work according to Unicode Standard. That is why you are seeing a difference between .NET 3.x and 5.x. But if run on 3.x on Linux, you should see the exact same behavior as you see it in .NET 5.0. On Linux we have been using ICU since .NET Core 1.0 and that behavior is there since then.

Last, if you are using .NET 5.0 and running on Windows and want to switch back to older behavior (as what you used to get in 3.x), please follow the instructions in the doc https://docs.microsoft.com/en-us/dotnet/standard/globalization-localization/globalization-icu#use-nls-instead-of-icu. This doc is useful in general to read.

I am closing the issue but feel free to send any question you think we can help with. Thanks for your report.