Open chr4ss1 opened 9 years ago
I think this would be a good addition. If we could make these to be simply overloads of the existing methods, maybe it would be better to have these directly on System.String. The only problem is that if we add them directly to System.String, we need to version the System.Runtime.dll contract. Either way, I think it would be a great topic to discuss at the next API review.
Chris, could you please add the following details to this proposal: namespace, type, assembly for these APIs. Once you have it, I will mark it as ready for review.
Would also be interesting to understand if we need IgoreCase variants and if they are ordinal or linguistic comparisons.
I've updated the post to clarify & put the details.
Thanks! I marked it as ready for API review.
We reviewed this issue today. We don't believe it's quite ready yet. Please see the notes for more details.
changing state "ready for api review" => "needs more info" per @terrajobst remarks above.
@KrzysztofCwalina, @svick
Might we also take the time to address scenarios with character escaping such as the following example:
"\a\f\b\t\a\f\b a b v".TrimStart(' ', '\a', '\f', '\t')
=> "\b\t\a\f\b a b v"
@terrajobst the link above doesn't work. What was the work needed here?
link above updated, feedback copy-pasted below by @tarekgh (thanks!)
bool
option is overkill and not very readable from the call site.@joperezr please talk to me if you are going to do any work here
We need formal API proposal (taking into account the above feedback)
@ChrisEelmaa do you have interest in updating the proposal here?
@danmosemsft yes, let me catch up with the feedback
@ChrisEelmaa just came across this PR, wondered whether you still had interest.
@danmosemsft I've learned not to promise anything that I can't keep, been quite busy lately, but I'll definitely do it at one point unless someone else has done it :P
@ChrisEelmaa sounds good! let us know in case you do want to take it so that we can assign the issue and provide any help if needed.
I would be happy to take this issue on, but it seems to have stalled. Has there ever been a decision on naming, etc?
I've been doing some digging just to get my head around things. I think I've a handle on everything apart from the unit tests. I know I can't just go and submit a PR today but my thinking is that I'd work off CoreClr (mscorlib/System.Private.CoreLib.sln) and this would then get automatically merged across to CoreFx. However, CoreClr has it's own set of String tests as does CoreFx. What happens here?
First, we need to update the proposal and have it be approved before any implementation work starts. That is what @ChrisEelmaa meant about going on the provided feedback and updating it. Once that is done and assuming that the Api is approved, then the norm is to add the API to coreclr and send out a PR and then add the tests on a separate PR to corefx. We will then review the coreclr PR and make sure that with your corefx one you have enough test coverage so that we can feel comfortable merging the implementation. After that, we would merge both PRs.
We need to address the design feedback first https://github.com/dotnet/apireviews/blob/267a7c25e418335b8aaf45c466d90425a9a944e4/2015/04-28-misc/README.md#1244-add-overloads-to-string-trimming and then update the proposal with the latest and run it one more time with the design committee.
@joperezr, @tarekgh
Thanks for clarification on the tests. Understood in terms finalising the design, I'll pick up once this has been done.
[Picked up from up-for-grabs
]
It is useful to have methods that trim a specified prefix or suffix from a string. This is somewhat simple for a developer to write themselves, but it seems to be a common enough request that it would be useful to support directly, especially in regards to robustness and performance.
Here are some references gleaned from the original issue. http://stackoverflow.com/questions/7170909/trim-string-from-end-of-string-in-net-why-is-this-missing http://stackoverflow.com/questions/4101539/c-sharp-removing-strings-from-end-of-string http://stackoverflow.com/questions/5284591/how-to-remove-a-suffix-from-end-of-string http://stackoverflow.com/questions/4335878/c-sharp-trimstart-with-string-parameter
There are only 2 overloads for each method, as shown in the following examples:
// Default overload
"http://foo.com".RemoveStart("http://").RemoveEnd(".com") == "foo"
// StringComparison
"http://foo.com".RemoveStart("HTTP://", StringComparison.OrdinalIgnoreCase) == "foo.com"
namespace System
{
public partial class String
{
/// <summary>
/// Removes the specified prefix if it is found at the start of the string.
/// Uses <see cref="StringComparison.Ordinal"/>.
/// </summary>
/// <param name="prefix">The prefix to remove.</param>
public string RemoveStart(string value);
/// <summary>
/// Removes the specified prefix if it is found at the start of the string.
/// Uses the specified <see cref="StringComparison"/>.
/// </summary>
/// <param name="suffix">The prefix to remove.</param>
/// <param name="comparisonType">The string comparison method to use.</param>
public string RemoveStart(string value, StringComparison comparisonType);
/// <summary>
/// Removes the specified suffix if it is found at the end of the string.
/// Uses <see cref="StringComparison.Ordinal"/>.
/// </summary>
/// <param name="suffix">The suffix to remove.</param>
public string RemoveEnd(string value);
/// <summary>
/// Removes the specified suffix if it is found at the end of the string.
/// Uses the specified <see cref="StringComparison"/>.
/// </summary>
/// <param name="suffix">The suffix to remove.</param>
/// <param name="comparisonType">The string comparison method to use.</param>
public string RemoveEnd(string value, StringComparison comparisonType);
}
// Per @danmosemsft's suggestion
public static partial class MemoryExtensions
{
// Implementation specializes T for byte and char
public static ReadOnlySpan<T> RemoveStart(this ReadOnlySpan<T> span, ReadOnlySpan<T> value)
where T : IEquatable<T>;
public static ReadOnlySpan<T> RemoveEnd(this ReadOnlySpan<T> span, ReadOnlySpan<T> value)
where T : IEquatable<T>;
// .Fast
public static ReadOnlySpan<char> RemoveStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
public static ReadOnlySpan<char> RemoveEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
}
}
(Some of these items from @tarekgh's feedback above)
StartsWith
, EndsWith
, TrimStart
, TrimEnd
namespace System
bool repeat
overloads. The callsite can call this recursively (at the risk of more allocations).
"SIdId".RemoveSuffix("Id")
should return "SId"
, not "S"
.TrimEnd
and TrimStart
that have a non-repeating behavior because it's inconsistent with the existing ones.null
value for prefix
or suffix
is provided, should we throw
or just return this
?This is a relatively simple addition, POC code below:
/// <summary>
/// Removes the specified prefix if it is found at the start of the string.
/// Uses <see cref="StringComparison.Ordinal"/>.
/// </summary>
/// <param name="suffix">The prefix to remove.</param>
public string RemovePrefix(string prefix)
=> RemovePrefix(prefix, StringComparison.Ordinal); // Or maybe its own body, depending on perf
/// <summary>
/// Removes the specified prefix if it is found at the start of the string.
/// Uses the specified <see cref="StringComparison"/>.
/// </summary>
/// <param name="suffix">The prefix to remove.</param>
/// <param name="comparisonType">The string comparison method to use.</param>
public string RemovePrefix(string prefix, StringComparison comparisonType)
{
if (prefix == null)
throw new ArgumentNullException(nameof(prefix));
// Alternative to the above guard
if (suffix == null)
return this;
if (prefix.Length == 0)
return this;
int len = this.Length - prefix.Length;
if (len < 0)
return this;
// main body elided - @tarekgh will advise implementation
}
/// <summary>
/// Removes the specified suffix if it is found at the end of the string.
/// Uses <see cref="StringComparison.Ordinal"/>.
/// </summary>
/// <param name="suffix">The suffix to remove.</param>
public string RemoveSuffix(string suffix)
=> RemoveSuffix(suffix, StringComparison.Ordinal); // Or maybe its own body, depending on perf
/// <summary>
/// Removes the specified suffix if it is found at the end of the string.
/// Uses the specified <see cref="StringComparison"/>.
/// </summary>
/// <param name="suffix">The suffix to remove.</param>
/// <param name="comparisonType">The string comparison method to use.</param>
public string RemoveSuffix(string suffix, StringComparison comparisonType)
{
if (suffix == null)
throw new ArgumentNullException(nameof(suffix));
// Alternative to the above guard
if (suffix == null)
return this;
if (suffix.Length == 0)
return this;
int len = this.Length - suffix.Length;
if (len < 0)
return this;
// main body elided - @tarekgh will advise implementation
}
Copying @danmosemsft's feedback over from dupe issue:
It's unfortunate this allocates a temporary string: "http://foo.com".TrimPrefix("http://").TrimSuffix(".com") but perhaps it's not common to do both. Since this original issue was discussed, Span
has become available. Should these be added to the extension methods: https://github.com/dotnet/corefx/blob/master/src/System.Memory/ref/System.Memory.cs#L85
Span has become available. Should these be added to the extension methods
[Edited]
OK, added such overloads to MemoryExtensions
, including the Span...StringComparison
methods.
Updated spec:
Changed naming to align with existing patterns StartsWith
, EndsWith
, TrimStart
, TrimEnd
(now RemoveStart
, RemoveEnd
. Was RemovePrefix
, RemoveSuffix
)
Any feedback on the api design proposal above?
@danmosemsft please advise how we can move this into review status
@joperezr should be able to help
@grant-d
The API proposal looks good to me which I think we can proceed to review it with the design committee.
I want to point your implementation for the methods which takes StringComparison comparisonType is wrong. Linguistic comparison is interesting but any way I can help you when you'll start to implement it and will tell you how you can do that in such cases. I'll move the proposal up (without your implementation sample) and we can move forward.
One question, why it is preferred RemoveStart/RemoveEnd over RemovePrefix/RemoveSuffix? actually I like RemovePrefix and RemoveSuffix more.
For reference, Go language uses TrimPrefix and TrimeSuffix which seams ok names too.
I think we can proceed... I can help you when you'll start to implement it and will tell you how you can do that in such cases
Thanks @tarekgh, appreciate the help
why it is preferred RemoveStart/RemoveEnd over RemovePrefix/RemoveSuffix
For the sake of consistency, I was trying to align with existing naming conventions on string
methods; StartsWith
, EndsWith
, TrimStart
, TrimEnd
...without your implementation
I have elided the questionable POC code
For the sake of consistency, I was trying to align with existing naming conventions on string methods; StartsWith, EndsWith, TrimStart, TrimEnd
We also have IsPrefix and IsSuffix :-) I am not feeling strongly about it but I just want to have all options listed in the proposal so in the design review we can pick the best one
I had this issue in the past and implemented both RemovePrefix
and TrimPrefix
. Here's what the names mean in my mind:
"a1a1x".TrimPrefix("a1") == "x"
"a1a1x".RemovePrefix("a1") == "a1x"
This is a subtle semantic difference which is often important to get right to avoid bugs.
I have worked on a fairly diverse set of applications and have needed both methods many times. I think both variants should be in the framework. If it is decided to only have one variant then I suggest that the proper name is chosen based on my argument.
@GSPP, I wonder if consumers will intuitively understand the difference in naming. What about having a default param bool recurse = false
on the method?
I agree with @grant-d RemovePrefix and TrimPrefix looks same to me and nothing indicate if one is recursive and the other is not. passing a bool for recursive may look better and explicit too. now I think we need to pick a name from the following list:
RemovePrefix TrimPrefix RemoveStart TrimStart
I am inclining to the first one (RemovePrefix).
If the parameter design is chosen the parameter should not have a default. A default does not make sense here because this value must always be consciously decided. The default will be wrong in about half of the cases.
I'd call that parameter removeAllInstances
or removeAllOccurrences
. There is no harm in having a long but descriptive name. I also liked the proposal in the opening post trimRepeatedly
.
A default does not make sense here because this value must always be consciously decided
I am not sure I agree with that. if we have the default and anyone don't like the default behavior, they can override that. The benefit of the default parameter is don't have to write extra things if the default is what you want.
Is there anything a community member can help with? This is one I declare as an extension method frequently and I'd be glad to help if there's anything I can do.
I believe the remaining open questions were regarding, do we need overloads or use the default parameter and avoid one extra overload. naming, what is preferred? using Prefix/Suffix or Start/End. I believe these can be addressed in the design review. I'll move forward with the proposal and mark it as ready for review.
CC @GrabYourPitchforks if he has any input too.
We made minor adjustments:
namespace System
{
public partial class String
{
public string RemoveStart(string value, StringComparison comparisonType);
public string RemoveEnd(string value, StringComparison comparisonType);
}
public static partial class MemoryExtensions
{
public static ReadOnlySpan<char> RemoveStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
public static ReadOnlySpan<char> RemoveEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
}
}
namespace System.Globalization
{
public class CompareInfo
{
public Range RemovePrefix(string source, string prefix, CompareOptions options = CompareOptions.None);
public Range RemovePrefix(ReadOnlySpan<char> source, ReadOnlySpan<char> prefix, CompareOptions options = CompareOptions.None);
public Range RemoveSuffix(string source, string suffix, CompareOptions options = CompareOptions.None);
public Range RemoveSuffix(ReadOnlySpan<char> source, ReadOnlySpan<char> suffix, CompareOptions options = CompareOptions.None);
}
}
Sorry to jump late on that, I have a question regarding the decisions:
Why we didn't try to make the String and CompareInfo methods be similar (i.e. both return the same results)? most of the APIs in CompareInfo and String are similar. I am not sure the usability of the CompareInfo API will be good anyway because, from the signature, users will need to call RemovePrefix/Suffix and then manually truncate the string/Span as the next step. why we didn't offer to do both steps in the API?
@tarekgh if you want, you can watch the video (it's linked from my comment). It's the very first issue that we're discussing.
Why we didn't try to make the String and CompareInfo methods be similar (i.e. both return the same results)?
For both String
and CompareInfo
we tried to be consistent with how each type already works. String
uses start/end terminology while CompareInfo
uses prefix/suffix. Same with StringComparison
vs CompareOptions
.
Regarding return types: it doesn't make sense to offer span APIs on String
. And as soon as spans are involved it gets more tricky, see below.
I am not sure the usability of the CompareInfo API will be good anyway because, from the signature, users will need to call RemovePrefix/Suffix and then manually truncate the string/Span as the next step. why we didn't offer to do both steps in the API?
We wanted to avoid having to add overloads for the entire span family (span, readonlyspan, memory, readonlymemory). If the API already slices then the caller has no easy to replicate the behavior. Returning a range makes that simpler.
Thanks @terrajobst
We wanted to avoid having to add overloads for the entire span family (span, readonlyspan, memory, readonlymemory). If the API already slices then the caller has no easy to replicate the behavior. Returning a range makes that simpler.
If this is the case then the name of the API is wrong I guess. the name doesn't reflect the operation it is doing.
I see that the milestone changed. Since the API is approved, is there a chance for a community member (🖐) to implement this in time for 5.0?
Is this a reasonable implementation for System.String and the same for ROS\
public string RemoveStart(string value, StringComparison comparisonType)
{
return StartsWith(value, comparisonType)
? this[value.Length..^0]
: this;
}
public string RemoveEnd(string value, StringComparison comparisonType)
{
return EndsWith(value, comparisonType)
? this[0..^value.Length]
: this;
}
Yes there is time. @tarekgh is @jnm2 good to go?
@jim2 The suggested implementation will work with the Ordinal operation only but will be wrong for the rest. The reason is the string we are searching for can have different length than the string we find. In general I don't mind to start the implementation. thanks for offering to help with that.
I left a comment https://github.com/dotnet/runtime/issues/14386#issuecomment-639640585 to @terrajobst as I have a concern about the naming we chose. so I hope we can resolve that before we have this change merged.
I am happy with this API so long as we call out the edge cases:
"\a\f\b\t\a\f\b a b v".TrimStart(' ', '\a', '\f', '\t') => "\b\t\a\f\b a b v"
@juliusfriedman The API string.TrimStart(params char[])
already exists and does what you want. Unless I'm misunderstanding your concern?
"\a\f\b\t\a\f\b a b v".TrimStart(' ', '\a', '\f', '\b', '\t') == "a b v"
Array.ConvertAll("\a\f\b\t\a\f\b a b v".TrimStart(' ', '\a', '\f').ToCharArray(), char.ToString)
==
08 07 0C 08 a b v
Updated proposal (new)
Edit May 16, 2022 by @GrabYourPitchforks. See https://github.com/dotnet/runtime/issues/14386#issuecomment-1118140438 for further discussion.
Updated Proposal (old)
It is useful to have methods that trim a specified prefix or suffix from a string. This is somewhat simple for a developer to write themselves, but it seems to be a common enough request that it would be useful to support directly, especially in regards to robustness and performance.
Here are some references gleaned from the original issue. http://stackoverflow.com/questions/7170909/trim-string-from-end-of-string-in-net-why-is-this-missing http://stackoverflow.com/questions/4101539/c-sharp-removing-strings-from-end-of-string http://stackoverflow.com/questions/5284591/how-to-remove-a-suffix-from-end-of-string http://stackoverflow.com/questions/4335878/c-sharp-trimstart-with-string-parameter
Usage
There are only 2 overloads for each method, as shown in the following examples:
Details, Decisions
(Some of these items from @tarekgh's feedback above)
Naming aligns with existing patterns StartsWith, EndsWith, TrimStart, TrimEnd Decision: namespace System Decision: No bool repeat overloads. The callsite can call this recursively (at the risk of more allocations). Looking that linked StackOverflow questions it seems folks want the non-repeating behavior, i.e. "SIdId".RemoveSuffix("Id") should return "SId", not "S". We don't want to introduce overloads for TrimEnd and TrimStart that have a non-repeating behavior because it's inconsistent with the existing ones. At the same time, we feel the bool option is overkill and not very readable from the call site. Questions Should be instance methods on String (as opposed to extension methods)? Should we add corresponding methods to TextInfo (or whatever they care called in globalization)? If a null value for prefix or suffix is provided, should we throw or just return this?
Old Proposal
The proposal is to add new extension methods / overloads for string trimming,
As you know, right now, it's only possible to trim individual characters from a string, but I would like to trim suffixes & prefixes.
An example usage:
I feel like they should've be there. It seems kind of weird to me, as I've implemented these by myself in the past few times, and am sure there are quite few people who miss this & would find it useful:
http://stackoverflow.com/questions/7170909/trim-string-from-end-of-string-in-net-why-is-this-missing http://stackoverflow.com/questions/4101539/c-sharp-removing-strings-from-end-of-string http://stackoverflow.com/questions/5284591/how-to-remove-a-suffix-from-end-of-string http://stackoverflow.com/questions/4335878/c-sharp-trimstart-with-string-parameter
Now the following statement is not true, but if it would be, it would describe how I am feeling: "ooh hey, we offer you string replace method which can replace individual chars, but not the whole strings. It's not too hard to craft your own one, give it a try!"
The following applies to
TrimEnd
/TrimStart
, but the overloads would be 1:1, so I will discuss onlyTrimEnd
.First overload:
public string TrimEnd(string suffix)
Behaviour: trim the suffix, case-sensitive, and only once. The comparision is done the same way, as it would be for
string.Replace
."MyTestTest".TrimEnd("Test") == "MyTest" == true
Second overload:
public string TrimEnd(string suffix, StringComparison comparison)
Works as the first one, but allows you to explictly tell how to compare.
"MyTESTtest".TrimEnd("test", StringComparison.InvariantCultureIgnoreCase) == "MyTEST" == true
Third overload(s):
I am not sure if these are needed, but I wanted to throw this out here anyway:
This proposal has nothing to do with string.Trim(), as it would be ambigious.
"tetet".Trim("tet") == ???
Namespace: System Type: System.String Assembly: System.Runtime.dll
I'd be willing to work on this :3