dotnet / docs

This repository contains .NET Documentation.
https://learn.microsoft.com/dotnet
Creative Commons Attribution 4.0 International
4.29k stars 5.92k forks source link

CompareOptions enum documentation is incorrect for .NET versions which use ICU #41052

Open daverayment opened 6 months ago

daverayment commented 6 months ago

Describe the issue or suggestion

This feedback is for the CompareOptions documentation here. This is the options enumeration which can be passed to String.Compare to influence how two strings are compared.

Since adopting ICU as the globalization API in .NET 5, the StringSort option has been the default for CompareOptions, i.e. both it and None result in the exact same output. This is not mentioned on the CompareOptions page.

Accordingly, the documentation for CompareOptions should be updated for .NET 5 and later to:

  1. Note the difference between the two Unicode APIs.
  2. Note that None is now an alias for StringSort (or the other way around!).
  3. Update the example code to correct the reported output for the None option, which is currently incorrect for these more recent versions of .NET. Alternatively, it may be removed entirely, and just noted to be an alias for StringSort.
  4. Detail how the user may use NLS to restore the None option from early .NET Framework releases. These steps are already detailed here, so linking to that article may be sufficient.

The example could do with a rewrite at the same time, as it is very old and has several issues:

  1. It is inconsistently formatted, with strangely-spaced code like Array.Sort( myArr, myComp ); and foreach ( String myStr in myArr ). (Not to mention use of String instead of the more common string alias etc.)
  2. The example itself 'buries the lede' with half of the code being spent setting up an unneeded MyStringComparer.
  3. The code does not follow safe formatting conventions and has multiple instances of single-line blocks not being delimited by braces, e.g.:
foreach ( String myStr in myArr )
         Console.WriteLine( myStr );
  1. The opening curly brace is on the same line for functions and class declarations, again not following current style conventions.
  2. Only one of the many CompareOption enum values is actually covered. Several of the others are quite subtle in their effects and would benefit from worked examples.
  3. I don't think this has been edited since early .NET Framework days, so many modern syntactic patterns are missing, which means the code looks quite archaic.
  4. Variable names are poorly chosen. myArr and myComp do not provide contextual information about their intended use.

I think the current C# sample code could be rewritten to look something like the following:

using System;
using System.Globalization;
using System.Collections.Generic;

public class CompareOptionsSample
{
    public static void Main()
    {
        var wordList = new List<string> { "cant", "bill's", "coop", "cannot", "billet", "can't", "con", "bills", "co-op" };

        Console.WriteLine("Before sorting:");
        foreach (string word in wordList)
        {
            Console.WriteLine(word);
        }

        Console.WriteLine("\nAfter sorting with CompareOptions.StringSort or CompareOptions.None:");
        var comparer = StringComparer.Create(CultureInfo.InvariantCulture, CompareOptions.StringSort);
        wordList.Sort(comparer);
        foreach (string word in wordList)
        {
            Console.WriteLine(word);
        }
    }
}

/*
This code produces the following output:

Before sorting:
cant
bill's
coop
cannot
billet
can't
con
bills
co-op

After sorting with CompareOptions.StringSort or CompareOptions.None:
bill's
billet
bills
can't
cannot
cant
co-op
con
coop
*/

Additional context may be found in this previous issue.

tarekgh commented 6 months ago

@daverayment are you interested in submitting a PR to address this issue?

daverayment commented 6 months ago

Hi @tarekgh . This would be my first docs contribution, but I will give it a try! Would you or someone else be able to help me with the following?

Your status says you're on vacation, so I don't expect a reply soon. I hope you enjoy your break! 🌴

tarekgh commented 6 months ago

@daverayment thanks for willing to contribute to the docs. I can help you through that.

The adoption of ICU was in .NET 5, so the page will need to display different content when .NET 5 or later is chosen from the Version drop-down. Is there a guide to creating this sort of 'branching' content? I've read through the contribution docs but couldn't see anything about this.

You don't need to have different content for different versions. Usually we add something to the remark section or even add some extra sentence to the description of StringSort in the page. The Example can add extra comment at the expected output showing the expectation when running on .NET 5.0 or later.

I will likely need a hand with the VB and F# examples.

The page only have C# sample. If want to edit any other pages, I expect the change will be in the comments there to show expected output on .NET 5.0 and later.

Currently, the example code only covers the StringSort option. I may need some help with creating examples to illustrate the other options.

That is fine if you want to cover other cases but most important is we need to fix StringSort docs.

Don't forget we need to add something to https://learn.microsoft.com/en-us/dotnet/core/extensions/globalization-icu too.

Note, the first doc exists in the https://github.com/dotnet/dotnet-api-docs repo and https://learn.microsoft.com/en-us/dotnet/core/extensions/globalization-icu exists in the https://github.com/dotnet/docs/blob/main/docs/core repo. So, most likely you want to submit two PRs. Thanks for your help with that!

daverayment commented 5 months ago

Hi @tarekgh.

Sorry for the delay on this. Could I get your feedback on the proposed changes below before I commit anything, please?

I have created a new example for the CompareOptions enum page, which illustrates each of the options:

using System;
using System.Globalization;

public class CompareOptionsExample
{
    public static void Main()
    {
        // Uppercase and lowercase characters are equivalent (according to the culture rules) when IgnoreCase is used.
        TestStringEquality("ONE two", "one TWO", "Case sensitivity", CompareOptions.IgnoreCase);

        // Punctuation is ignored with the IgnoreSymbols option.
        TestStringEquality("hello world", "hello, world!", "Punctuation", CompareOptions.IgnoreSymbols);

        // Whitespace and mathematical symbols are also ignored with IgnoreSymbols.
        TestStringEquality("3 + 5 = 8", "358", "Whitespace and mathematical symbols", CompareOptions.IgnoreSymbols);

        // Caution: currency symbols and thousands separators are ignored with IgnoreSymbols.
        // Parse strings containing numbers/currency and compare them numerically instead.
        TestStringEquality("Total $15,000", "Total: £150.00", "Currency symbols, decimals and thousands separators", CompareOptions.IgnoreSymbols);

        // Full width characters are common in East Asian languages. Use the IgnoreWidth
        // option to treat full- and half-width characters as equal.
        TestStringEquality("abc,-", "abc,-", "Half width and full width characters", CompareOptions.IgnoreWidth);

        // The same string in Hiragana and Katakana is equal when IgnoreKanaType is used.
        TestStringEquality("ありがとう", "アリガトウ", "Hiragana and Katakana strings", CompareOptions.IgnoreKanaType);

        // When comparing with the IgnoreNonSpace option, characters like diacritical marks are ignored.
        TestStringEquality("café", "cafe", "Diacritical marks", CompareOptions.IgnoreNonSpace);

        // Ligature characters and their non-ligature forms compare equal with the IgnoreNonSpace option.
        // Note: prior to .NET 5, ligature characters were equal to their expanded forms by default.
        TestStringEquality("straße œuvre cæsar", "strasse oeuvre caesar", "Ligature characters", CompareOptions.IgnoreNonSpace);
    }

    private static void TestStringEquality(string str1, string str2, string description, CompareOptions options)
    {
        Console.WriteLine(Environment.NewLine + description + ":");
        // First test with the default CompareOptions then with the provided options
        TestStringEquality(str1, str2, CompareOptions.None);
        TestStringEquality(str1, str2, options);
    }

    private static void TestStringEquality(string str1, string str2, CompareOptions options)
    {
        Console.Write(string.Format("  When using CompareOptions.{0}, \"{1}\" and \"{2}\" are ", options, str1, str2));
        if (string.Compare(str1, str2, CultureInfo.InvariantCulture, options) != 0)
        {
            Console.Write("not ");
        }
        Console.WriteLine("equal.");
    }
}

/*
In .NET 5 and later, the output will be the following:

Case sensitivity:
  When using CompareOptions.None, "ONE two" and "one TWO" are not equal.
  When using CompareOptions.IgnoreCase, "ONE two" and "one TWO" are equal.

Punctuation:
  When using CompareOptions.None, "hello world" and "hello, world!" are not equal.
  When using CompareOptions.IgnoreSymbols, "hello world" and "hello, world!" are equal.

Whitespace and mathematical symbols:
  When using CompareOptions.None, "3 + 5 = 8" and "358" are not equal.
  When using CompareOptions.IgnoreSymbols, "3 + 5 = 8" and "358" are equal.

Currency symbols, decimals and thousands separators:
  When using CompareOptions.None, "Total $15,000" and "Total: £150.00" are not equal.
  When using CompareOptions.IgnoreSymbols, "Total $15,000" and "Total: £150.00" are equal.

Half width and full width characters:
  When using CompareOptions.None, "abc,-" and "abc,-" are not equal.
  When using CompareOptions.IgnoreWidth, "abc,-" and "abc,-" are equal.

Hiragana and Katakana strings:
  When using CompareOptions.None, "ありがとう" and "アリガトウ" are not equal.
  When using CompareOptions.IgnoreKanaType, "ありがとう" and "アリガトウ" are equal.

Diacritical marks:
  When using CompareOptions.None, "café" and "cafe" are not equal.
  When using CompareOptions.IgnoreNonSpace, "café" and "cafe" are equal.

Ligature characters:
  When using CompareOptions.None, "straße œuvre cæsar" and "strasse oeuvre caesar" are not equal.
  When using CompareOptions.IgnoreNonSpace, "straße œuvre cæsar" and "strasse oeuvre caesar" are equal.
*/

I've also updated the StringSort example to correct the issues I raised in my original post:

using System;
using System.Collections.Generic;
using System.Globalization;

public class StringSort
{
    public static void Main()
    {
        var wordList = new List<string>
        {
            "cant", "bill's", "coop", "cannot", "billet", "can't", "con", "bills", "co-op"
        };

        Console.WriteLine("Before sorting:");
        foreach (string word in wordList)
        {
            Console.WriteLine(word);
        }

        Console.WriteLine(Environment.NewLine + "After sorting with CompareOptions.None:");
        SortAndDisplay(wordList, CompareOptions.None);

        Console.WriteLine(Environment.NewLine + "After sorting with CompareOptions.StringSort:");
        SortAndDisplay(wordList, CompareOptions.StringSort);
    }

    // Sort the list of words with the supplied CompareOptions.
    private static void SortAndDisplay(List<string> unsorted, CompareOptions options)
    {
        // Create a copy of the original list to sort.
        var words = new List<string>(unsorted);
        // Define the CompareInfo to use to compare strings.
        var comparer = CultureInfo.InvariantCulture.CompareInfo;

        // Sort the copy with the supplied CompareOptions then display.
        words.Sort((str1, str2) => comparer.Compare(str1, str2, options));
        foreach (string word in words)
        {
            Console.WriteLine(word);
        }
    }
}

/*
CompareOptions.None and CompareOptions.StringSort provide identical ordering by default
in .NET 5 and later, but in prior versions, the output will be the following:

Before sorting:
cant
bill's
coop
cannot
billet
can't
con
bills
co-op

After sorting with CompareOptions.None:
billet
bills
bill's
cannot
cant
can't
con
coop
co-op

After sorting with CompareOptions.StringSort:
bill's
billet
bills
can't
cannot
cant
co-op
con
coop
*/

(Both examples should compile in .NET Framework 4 and .NET 5 and later.)

To be honest, I wonder whether the StringSort example is actually required now; in .NET 5 and later, it requires opting in to using NLS, and I think it may be a distraction on the page. What do you think?

I've put some work into editing the CompareOptions page text, too, to make things a little more approachable and descriptive (see CompareOptions.zip), but could I ask if there is a way of checking how my edits to the XML will look when rendered? Editing the raw XML without being able to check is a little scary!

Note: I haven't integrated the new example into the XML yet. Would this just require adding a new block, referring to where I will place the new code file(s)?

Likewise, I haven't yet made the edits to the Globalization ICU page, but that hopefully shouldn't require as much time!

tarekgh commented 5 months ago

I have created a new example for the CompareOptions enum page, which illustrates each of the options:

Sounds good. We want to see if there is any difference in the output comparing .NET to .NET Framework and capture the difference in the writing if possible.

Console.Write(string.Format(" When using CompareOptions.{0}, \"{1}\" and \"{2}\" are ", options, str1, str2)); Console.WriteLine(Environment.NewLine + description + ":");

Try to use the interpolated strings instead.

Console.Write(string.Format($"  When using CompareOptions.{options}, \"{str1}\" and \"{str2}\" are "));

Console.WriteLine($"{Environment.NewLine}{description}:");

To be honest, I wonder whether the StringSort example is actually required now; in .NET 5 and later, it requires opting in to using NLS, and I think it may be a distraction on the page. What do you think?

There are still many .NET Framework uses which such sample still relevant and useful. We need to ensure the doc is clear describing the behavior when running on .NET or .NET Framework.

but could I ask if there is a way of checking how my edits to the XML will look when rendered?

When you submit a PR, it will build it for you and can see the change visually. Don't worry if you get something wrong in the first submission, you always can update the PR with fixes before we merge it.

Note: I haven't integrated the new example into the XML yet. Would this just require adding a new block, referring to where I will place the new code file(s)?

You can look at how the examples are embedded inside the documentation. Here is example how it is done:

https://github.com/dotnet/dotnet-api-docs/blob/d6d38e405484148855192144a74024519c11b731/xml/System.Globalization/CompareOptions.xml#L73

Likewise, I haven't yet made the edits to the Globalization ICU page, but that hopefully shouldn't require as much time!

sounds good. Thanks!