Open chr4ss1 opened 9 years ago
@juliusfriedman Both samples seem to already produce the correct output. If you're reporting a bug in the existing API, please clearly provide repro code, including the output you observe and the output you expect.
@tarekgh
I left a comment #14386 (comment) to @terrajobst as I have a concern about the naming we chose. so I hope we can resolve that before we have this change merged.
Is this issue tagged in such a way that it will be resolved?
@terrajobst could you please comment on https://github.com/dotnet/runtime/issues/14386#issuecomment-639640585. sorry to raise that after the fact but I hope we can resolve this before we proceed with the implementation.
@jnm2 let's wait @terrajobst response and then we can change the issue tagging if anything change. thanks for following up.
Now that https://github.com/dotnet/runtime/issues/27935 is in it's fairly straightforward to implement these routines. For example, here's string.RemoveStart
(as an extension method):
public static string RemoveStart(string @this, string value, StringComparison comparison = StringComparison.Ordinal)
{
if (value is null) throw new ArgumentNullException();
CultureInfo cultureInfo;
CompareOptions compareOptions;
switch (comparison)
{
case StringComparison.Ordinal:
cultureInfo = CultureInfo.InvariantCulture; compareOptions = CompareOptions.Ordinal; break;
case StringComparison.OrdinalIgnoreCase:
cultureInfo = CultureInfo.InvariantCulture; compareOptions = CompareOptions.OrdinalIgnoreCase; break;
case StringComparison.InvariantCulture:
cultureInfo = CultureInfo.InvariantCulture; compareOptions = CompareOptions.None; break;
case StringComparison.InvariantCultureIgnoreCase:
cultureInfo = CultureInfo.InvariantCulture; compareOptions = CompareOptions.IgnoreCase; break;
case StringComparison.CurrentCulture:
cultureInfo = CultureInfo.CurrentCulture; compareOptions = CompareOptions.None; break;
case StringComparison.CurrentCultureIgnoreCase:
cultureInfo = CultureInfo.CurrentCulture; compareOptions = CompareOptions.IgnoreCase; break;
default:
throw new ArgumentException();
}
if (!cultureInfo.CompareInfo.IsPrefix(@this, value, compareOptions, out int matchLength))
{
return @this; // nothing being removed
}
return @this.Substring(matchLength); // chars 0..matchLength being removed
}
In practice if this gets added to the string
class we'd be able to take advantage of existing helper methods to get the correct CompareInfo
and CompareOptions
values from a given StringComparison
argument.
Needed this in yet another project. How nervous should I be that this will miss .NET 5?
@jnm2 sorry we missed this in 5.0 but I moved the milestone to 6.0 at least we'll address it in the next release.
The good news is you can easily have a workaround. Fortunately, @GrabYourPitchforks got https://github.com/dotnet/runtime/issues/27935 which can help you in writing extension method for now as a workaround till we get this in 6.0.
We are revisiting this proposal for two reasons:
Here is the new proposal we can look at and discuss.
namespace System
{
public partial class String
{
public string TrimStart(string value, bool ignoreCase = false);
public string TrimEnd(string value, bool ignoreCase = false);
}
public static partial class MemoryExtensions
{
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, bool ignoreCase = false);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, bool ignoreCase = false);
}
}
namespace System.Globalization
{
public class CompareInfo
{
public string TrimPrefix(string source, string prefix, CompareOptions options = CompareOptions.None);
public string TrimSuffix(string source, string suffix, CompareOptions options = CompareOptions.None);
public static ReadOnlySpan<char> TrimPrefix(ReadOnlySpan<char> source, ReadOnlySpan<char> prefix, CompareOptions options = CompareOptions.None);
public static ReadOnlySpan<char> TrimSuffix(ReadOnlySpan<char> source, ReadOnlySpan<char> suffix, CompareOptions options = CompareOptions.None);
}
}
Tarek's latest proposal should also make implementation easier, as we can ditch the giant switch statement in TrimStart
and TrimEnd
and keep all the complex globalization logic contained within CompareInfo
.
Downside is that it means string.StartsWith
and string.TrimStart
have different behavior, but a proper solution for this is really under the jurisdiction of https://github.com/dotnet/designs/pull/207.
@GSPP do you have more input why you don't like the proposal?
While I'm not @GSPP I can say that this is just adding confusion to the String class. It will be harder for newcomers to find and reason about CompareInfo and honestly, I'm in .Net like 4 years and I have never used it because most useful String overloads take StringComparison parameter.
Also, I don't really like the fact of String methods being all around the place: some take StringComparison, some don't. Again, it feels very unnatural for a newcomer. Like Contains and StartsWith take it, but TrimStart don't. It feels unnatural to me too. Like an overload zoo which is not consistent.
While I can understand that you want to actually clear this confusion, I believe it is impossible with current String class and I feel like it's better to leave it as newcomer friendly as possible. Yeah, it forces you to pass explicit parameter, it might not be that friendly but at least it should be consistent.
I believe it's only possible with new classes, maybe specific ones like Utf16String, Utf8String but those are out of question I guess.
@En3Tho yeah, this is my thinking. This is adding fragmentation to the API surface and discoverability is not good. I have personally very rarely touched these globalization classes. It was rarely necessary and they seemed a bit arcane to me.
Handling a CompareInfo
class, one of the more arcane bits of the framework, just to trim a string seems very heavy weight. It's a learnability issue. Try teaching a 7 day C# programmer what a CompareInfo
is...
Even placing char
algorithms in MemoryExtensions
has been, IMHO, not the right choice. Text processing is not "memory processing". Memory means raw bytes and not (possible case insensitive) text processing.
Of course, now that MemoryExtensions
is already part of the framework, I guess there is no choice but to carry forward that decision for consistency.
Now string
is widely considered to be a bit weird with respect to culture sensitivity. This is a .NET 1.0 mistake. But still, new functionality for strings belongs there. Or, it normally belongs on a StringExtensions
class.
There's also a usability issue with the ignoreCase
parameter. Which of the six StringComparison
modes does this correspond to? I have a hunch but it's really hard to figure out. I would now need to study this proposal to find out.
To say concretely what I would do: I would make all new APIs require a StringComparison
parameter. This should have been the .NET 1.0 way and we can do it now. I find from my practical experience that I never know what an API without that argument is going to do. Just provide that parameter. It's not that bad.
It will be harder for newcomers to find and reason about CompareInfo and honestly, While I can understand that you want to actually clear this confusion, I believe it is impossible with current String class and I feel like it's better to leave it as newcomer friendly as possible. Yeah, it forces you to pass explicit parameter, it might not be that friendly but at least it should be consistent.
We are seeing the newcomers mostly interested in the ordinal behavior and not the linguistic behavior. Currently, the newcomers get confused when they use the string class and get linguistic behavior which they don't understand and that is main source of confusion. Switching the default behavior to ordinal can also create confusion. I would recommend you review https://github.com/dotnet/designs/pull/207 and have your input there. This proposal trying to address the issues you are raising here. If you have better ideas to add there, we welcome that.
I'm in .Net like 4 years and I have never used it because most useful String overloads take StringComparison parameter.
What options you used to pass with StringComparison? I am curious to learn more about your scenario.
Also, I don't really like the fact of String methods being all around the place: some take StringComparison, some don't. Again, it feels very unnatural for a newcomer. Like Contains and StartsWith take it, but TrimStart don't. It feels unnatural to me too. Like an overload zoo which is not consistent.
We already have such cases and that is what we are trying to address to reduce that moving forward.
this is my thinking. This is adding fragmentation to the API surface and discoverability is not good. I have personally very rarely touched these globalization classes. It was rarely necessary and they seemed a bit arcane to me. Handling a CompareInfo class, one of the more arcane bits of the framework, just to trim a string seems very heavy weight. It's a learnability issue. Try teaching a 7 day C# programmer what a CompareInfo is...
Mostly, only a few sectors of users who need the linguistic functionality (e.g. UI developers). From our experience, most of the users expect the ordinal behavior. That is why we are trying to not inject more linguistic functionality to the string class. I am, not really seeing a big deal for anyone want the linguistic functionality to use CompareInfo instead. Most of such users already using CultureInfo and other Globalization classes. I am curious to learn more about your scenarios when you use linguistic functionality.
There's also a usability issue with the ignoreCase parameter. Which of the six StringComparison modes does this correspond to? I have a hunch but it's really hard to figure out. I would now need to study this proposal to find out.
In long term if we try to hide the linguistic functionality from the string class, this issue wouldn't be a concern.
I want to clarify here; I am not trying to reject any feedback. I am trying to collect more data to decide about the future direction in general. If you didn't add your comments on the proposal https://github.com/dotnet/designs/pull/207, I would encourage you to take some time review it and add your feedback there.
In long term if we try to hide the linguistic functionality from the string class, this issue wouldn't be a concern.
I think this would be a grave mistake. This is not how real-world development works... People don't want do deal with complex APIs simply to perform linguistic string processing. I really do not intend any disrespect but this strong focus on ordinal APIs appears to be a bit "ivory tower" to me. (A little joking: It appears to be the Java way... 😏).
I find the StringComparison
approach to be entirely adequate to real-world development scenarios. I personally do it that way. I always specify it and that solves the problem. The code looks nice.
I just made a comment (https://github.com/dotnet/runtime/issues/43956#issuecomment-896622767) that is pertinent to this discussion as well.
@GSPP Thanks for the feedback. We often see devs use StringComparison
in an attempt to do the right thing as well. The problem that we've seen in practice is that devs will very often use StringComparison.InvariantCulture[IgnoreCase]
, which is almost never the correct comparer for any reasonable scenario. Over the code we've scanned on GitHub and other forums, the number of times we saw InvariantCulture
used compared to Ordinal
was alarming and indicated to us that this is a significant pit of failure for developers. That's one of the reasons we're very hesitant to add any such overloads to new APIs on string
, regardless of the ultimate fate of the existing API surface.
FWIW, I speculated a bit on why I think devs are drawn to InvariantCulture[IgnoreCase]
. If that reasoning proves correct, then we can use that information to figure out a pit-of-success design for APIs like those proposed here.
Hashed this out a bit with @tarekgh. We think the below might be a viable proposal for .NET 7.
namespace System
{
public static partial class MemoryExtensions
{
public static ReadOnlySpan<T> TrimIfStartsWith<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> value);
public static ReadOnlySpan<char> TrimIfStartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
public static Span<T> TrimIfStartsWith<T>(this Span<T> span, ReadOnlySpan<T> value);
public static Span<char> TrimIfStartsWith(this Span<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
public static ReadOnlySpan<T> TrimIfEndsWith<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> value);
public static ReadOnlySpan<char> TrimIfEndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
public static Span<T> TrimIfEndsWith<T>(this Span<T> span, ReadOnlySpan<T> value);
public static Span<char> TrimIfEndsWith(this Span<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
public static ReadOnlySpan<T> TrimIfSurroundedBy<T>(this ReadOnlySpan<T> span, T startValue, T endValue);
public static ReadOnlySpan<T> TrimIfSurroundedBy<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> startValue, ReadOnlySpan<T> endValue);
public static ReadOnlySpan<char> TrimIfSurroundedBy<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> startValue, ReadOnlySpan<T> endValue, StringComparison comparisonType);
public static Span<T> TrimIfSurroundedBy<T>(this Span<T> span, T startValue, T endValue);
public static Span<T> TrimIfSurroundedBy<T>(this Span<T> span, ReadOnlySpan<T> startValue, ReadOnlySpan<T> endValue);
public static Span<char> TrimIfSurroundedBy<T>(this Span<T> span, ReadOnlySpan<T> startValue, ReadOnlySpan<T> endValue, StringComparison comparisonType);
}
}
// APIs on CompareInfo are optional.
// They can be implemented via existing IsPrefix / IsSuffix methods.
namespace System.Globalization
{
public class CompareInfo
{
public string TrimPrefix(string source, string prefix, CompareOptions options = CompareOptions.None);
public string TrimSuffix(string source, string suffix, CompareOptions options = CompareOptions.None);
public static ReadOnlySpan<char> TrimPrefix(ReadOnlySpan<char> source, ReadOnlySpan<char> prefix, CompareOptions options = CompareOptions.None);
public static ReadOnlySpan<char> TrimSuffix(ReadOnlySpan<char> source, ReadOnlySpan<char> suffix, CompareOptions options = CompareOptions.None);
}
}
It differs from the earlier proposal in that it's focused on spans for the time being, which allows callers to use the generic versions on more data types.
ReadOnlySpan<char> input1 = "Hello there!";
input1 = input1.TrimIfStartsWith(input1, "Hello");
Console.WriteLine(input1); // " there!"
ReadOnlySpan<char> input2 = "Hi there!";
input2 = input2.TrimIfStartsWith(input2, "Hello");
Console.WriteLine(input2); // "Hi there!"
// using new "u8" literal syntax!
ReadOnlySpan<byte> input3 = "https://example.com/"u8;
input3 = input3.TrimIfStartsWith(input3, "https://"u8);
Console.WriteLine(Encoding.UTF8.GetString(input3)); // "example.com/"
// Removing surrounding parentheses.
ReadOnlySpan<char> input4 = "(Hello world!)";
input4 = input4.TrimIfSurroundedBy('(', ')');
Console.WriteLine(input4); // "Hello world!" (no parens)
The behavior for the generic method is Ordinal. For the T = char methods, you can specify a StringComparison parameter, matching the pattern of most other T = char methods on MemoryExtensions.
We already have methods called MemoryExtensions.TrimStart and MemoryExtensions.TrimEnd, which both implement "trim any" semantics. That is, TrimStart(..., "xyz")
trims any of the characters x, y, and z any number of times from the start of the string. This is not the same behavior we'd have for the methods being proposed here, which is why it seemed ill-advised to make them overloads.
To work around this, we propose the method names TrimIfStartsWith and TrimIfEndsWith. This makes it clear that the logical equivalent of a StartsWith / EndsWith check is taking place, and if that call returns a match, then the trim takes place.
The methods hanging off of CompareInfo are named TrimPrefix and TrimSuffix to match the other *Prefix and *Suffix methods on the type. We didn't want to bring these names forward onto MemoryExtensions directly since we don't commonly use these terms outside the CompareInfo class.
The CompareInfo class would accept both string and span inputs, matching the pattern of the other methods on the type.
There are no methods hanging directly off of string. We can add those later once details about how we want to expose culture information on the string type are hashed out and what the default options would be. In the meantime, if you need the equivalent of a string-returning string.TrimIfStartsWith(value)
method, you can use CultureInfo.InvariantCulture.CompareInfo.TrimPrefix(value, CompareOptions.Ordinal)
.
The extension methods themselves would be utilize the new CompareInfo.IsPrefix and CompareInfo.IsSuffix overloads we added in .NET 5.
The TrimIfSurroundedBy method is a bit of a special case since it wraps several operations together. It performs an IsPrefix check against the starting value and trims, then it performs an IsSuffix check against the ending value and trims. If both the prefix and the suffix check succeeded, then the final trimmed span is returned. If either the prefix or the suffix check does not succeed, then this is considered "no match" and the original span is returned. The culture-aware behavior of trimming between checking IsPrefix and IsSuffix is to avoid the situation where the start and end values might overlap within the target span.
I've seen a few cases where callers want to know if any prefix / suffix was matched before trimming, since they'll want to perform some extra operation. For example:
ReadOnlySpan<char> originalInput = GetSomeInput();
var trimmedInput = originalInput.TrimIfSurroundedBy('\"', '\"');
DoSomeOperation();
if (wasOriginallySurroundedByQuotes)
ReintroduceQuotesToOutput();
For these cases, rather than complicate the APIs proposed here with out bool wasTrimmed
arguments, I'd suggest the caller just check whether the original span length differes from the trimmed span length. If the lengths are different, a trimming operation was performed. (n.b. This check is incorrect for linguistic trimming, which might report zero-length matches, but I suspect linguistic trimming will be very much an edge case scenario.)
In MemoryExtensions, all methods are Ordinal by default unless the caller specifies a custom StringComparison. This includes APIs like StartsWith and EndsWith.
However, the APIs string.StartsWith and string.EndsWith use CurrentCulture by default, meaning the two lines below can produce different results:
bool startsWith1 = myString.StartsWith("Hello");
bool startsWith2 = myString.AsSpan().StartsWith("Hello");
If we do end up adding these methods directly to string, we should be mindful of the possibility for confusion when people call these APIs. It would be confusing for developers if string.StartsWith returns false (because it's using a CurrentCulture comparison by default) but string.TrimIfStartsWith strips data anyway (because it's using Ordinal comparison by default).
This is the second proposal of the method, these two questions are not connected in any way.
Updated proposal (new)
Edit May 16, 2022 by @GrabYourPitchforks. See https://github.com/dotnet/runtime/issues/14386#issuecomment-1118140438 for further discussion.
Updated Proposal (old)
It is useful to have methods that trim a specified prefix or suffix from a string. This is somewhat simple for a developer to write themselves, but it seems to be a common enough request that it would be useful to support directly, especially in regards to robustness and performance.
Here are some references gleaned from the original issue. http://stackoverflow.com/questions/7170909/trim-string-from-end-of-string-in-net-why-is-this-missing http://stackoverflow.com/questions/4101539/c-sharp-removing-strings-from-end-of-string http://stackoverflow.com/questions/5284591/how-to-remove-a-suffix-from-end-of-string http://stackoverflow.com/questions/4335878/c-sharp-trimstart-with-string-parameter
Usage
There are only 2 overloads for each method, as shown in the following examples:
Details, Decisions
(Some of these items from @tarekgh's feedback above)
Naming aligns with existing patterns StartsWith, EndsWith, TrimStart, TrimEnd Decision: namespace System Decision: No bool repeat overloads. The callsite can call this recursively (at the risk of more allocations). Looking that linked StackOverflow questions it seems folks want the non-repeating behavior, i.e. "SIdId".RemoveSuffix("Id") should return "SId", not "S". We don't want to introduce overloads for TrimEnd and TrimStart that have a non-repeating behavior because it's inconsistent with the existing ones. At the same time, we feel the bool option is overkill and not very readable from the call site. Questions Should be instance methods on String (as opposed to extension methods)? Should we add corresponding methods to TextInfo (or whatever they care called in globalization)? If a null value for prefix or suffix is provided, should we throw or just return this?
Old Proposal
The proposal is to add new extension methods / overloads for string trimming,
As you know, right now, it's only possible to trim individual characters from a string, but I would like to trim suffixes & prefixes.
An example usage:
I feel like they should've be there. It seems kind of weird to me, as I've implemented these by myself in the past few times, and am sure there are quite few people who miss this & would find it useful:
http://stackoverflow.com/questions/7170909/trim-string-from-end-of-string-in-net-why-is-this-missing http://stackoverflow.com/questions/4101539/c-sharp-removing-strings-from-end-of-string http://stackoverflow.com/questions/5284591/how-to-remove-a-suffix-from-end-of-string http://stackoverflow.com/questions/4335878/c-sharp-trimstart-with-string-parameter
Now the following statement is not true, but if it would be, it would describe how I am feeling: "ooh hey, we offer you string replace method which can replace individual chars, but not the whole strings. It's not too hard to craft your own one, give it a try!"
The following applies to
TrimEnd
/TrimStart
, but the overloads would be 1:1, so I will discuss onlyTrimEnd
.First overload:
public string TrimEnd(string suffix)
Behaviour: trim the suffix, case-sensitive, and only once. The comparision is done the same way, as it would be for
string.Replace
."MyTestTest".TrimEnd("Test") == "MyTest" == true
Second overload:
public string TrimEnd(string suffix, StringComparison comparison)
Works as the first one, but allows you to explictly tell how to compare.
"MyTESTtest".TrimEnd("test", StringComparison.InvariantCultureIgnoreCase) == "MyTEST" == true
Third overload(s):
I am not sure if these are needed, but I wanted to throw this out here anyway:
This proposal has nothing to do with string.Trim(), as it would be ambigious.
"tetet".Trim("tet") == ???
Namespace: System Type: System.String Assembly: System.Runtime.dll
I'd be willing to work on this :3