dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.82k stars 4.62k forks source link

Update .NET 7 Unicode data to version 14.0.0 #44423

Closed GrabYourPitchforks closed 2 years ago

GrabYourPitchforks commented 3 years ago

The Unicode Standard version 14.0.0 is tentatively scheduled for September 2021. As per usual, since the .NET runtime carries a copy of Unicode-derived data, we should update our data files to match version 14.0.0 when it's released.

This will affect the following APIs:

For instructions on how to update the runtime-carried Unicode data files, consult the GenUnicodeProp docs and the STEW docs. Also update the UnicodeUcdVersion data throughout our .csproj files (see samples).

See https://github.com/dotnet/runtime/issues/2378 for the changes we made for Unicode 13.0.0 in .NET 5.

We should also keep an eye out for any changes to UAX#29 that might be part of the Unicode 14.0.0 wave. Our tools will automatically pick up any changes to a code point's Grapheme_Cluster_Break property, but if the algorithm in Sec. 3.1.1 changes as part of Unicode 14.0.0 then we may need to update the logic in TextSegmentationUtility.cs.

ghost commented 3 years ago

Tagging subscribers to this area: @tarekgh, @safern, @krwq See info in area-owners.md if you want to be subscribed.


Issue meta data

Issue content: The Unicode Standard version __14.0.0__ [is tentatively scheduled](https://home.unicode.org/unicode-14-0-delayed-for-6-months/) for September 2021. As per usual, since the .NET runtime carries a copy of Unicode-derived data, we should update our data files to match version 14.0.0 when it's released. This will affect the following APIs: * `System.Globalization.StringInfo` * `System.Globalization.CharUnicodeInfo` * `System.Text.Encodings.Web.*` * `System.Text.Json.*` (since it depends on `System.Text.Encodings.Web`) For instructions on how to update the runtime-carried Unicode data files, consult [the _GenUnicodeInfo_ docs](https://github.com/dotnet/runtime/blob/master/src/coreclr/src/System.Private.CoreLib/Tools/GenUnicodeProp/Readme.md) and [the _STEW_ docs](https://github.com/dotnet/runtime/blob/master/src/libraries/System.Text.Encodings.Web/tools/updating-encodings.md). Also update the _UnicodeUcdVersion_ data throughout our .csproj files ([see samples](https://github.com/dotnet/runtime/search?l=XML&q=UnicodeUcdVersion)). See https://github.com/dotnet/runtime/issues/2378 for the changes we made for Unicode 13.0.0 in .NET 5. We should also keep an eye out for any changes to [UAX\#29](https://www.unicode.org/reports/tr29/) that might be part of the Unicode 14.0.0 wave. Our tools will automatically pick up any changes to a code point's *Grapheme\_Cluster\_Break* property, but if the algorithm in Sec. 3.1.1 changes as part of Unicode 14.0.0 then we may need to update the logic in [_TextSegmentationUtility.cs_](https://github.com/dotnet/runtime/blob/master/src/libraries/System.Private.CoreLib/src/System/Text/Unicode/TextSegmentationUtility.cs).
Issue author: GrabYourPitchforks
Assignees: -
Milestone: [object Object]

tarekgh commented 3 years ago

@GrabYourPitchforks just checkin, are you planning for doing that soon?

GrabYourPitchforks commented 3 years ago

Moving this to 7.0 so that the dates line up correctly.

GrabYourPitchforks commented 3 years ago

Now that we're within a month of Unicode 14.0's release, I gave https://unicode.org/versions/Unicode14.0.0/ another look. There's a new block Arabic Extended-B being added to the BMP. Our ingestion tools will automatically create a new API to support this block, so I opened https://github.com/dotnet/runtime/issues/57609 to track the API review process for it.

We're still waiting for the PDFs to be published in case there were any changes to Sec. 5.8 (which controls string.ReplaceLineEndings). So far we're still good on UAX#29 (which controls StringInfo).

tarekgh commented 2 years ago

@GrabYourPitchforks what is remaining to do here?