dotnet / dotnet-api-docs

.NET API reference documentation (.NET 5+, .NET Core, .NET Framework)
https://docs.microsoft.com/dotnet/api/
Other
736 stars 1.57k forks source link

Regex examples should use "\z" not "$" for end-of-string #5422

Open twylite opened 3 years ago

twylite commented 3 years ago

The documentation for Regex.IsMatch (in Regex.xml) includes an example pattern ^[a-zA-Z0-9]\d{2}[a-zA-Z0-9](-\d{3}){2}[A-Za-z0-9]$. It describes the trailing "$" as "End the match at the end of the line". This is not entirely accurate: Anchors in Regular Expressions says that "The $ anchor specifies that the preceding pattern must occur at the end of the input string, or before \n at the end of the input string", and testing confirms this behavior. This means that the example pattern accepts part number "1298-673-4192\n", which is a subtle validation bug.

Proposed fix: Use the anchor "\z" instead of "$".

This fix is relevant because Java, PCRE, and various other regex engines have "$" behave like dotnet's "\z". Regex.IsMatch should draw attention to the correct anchor, to help developers avoid validation bugs.

antonfirsov commented 3 years ago

Tagging area owners @pgovind @tannergooding

jzabroski commented 2 years ago

@twylite I actually put together a spec to improve this, but got sidetracked a good bit with long haul COVID.

https://github.com/dotnet/runtime/issues/25598

In the issue, Dan Moseley actually does a great job spec'ing out how to think about this.