Open Joe4evr opened 7 months ago
Func<char, bool>
predicates would make sense also
public string RemoveChars(Func<char, bool> predicate);
public string KeepChars(Func<bool, char> predicate);
This would allow you to do
someString = someString.KeepChars(char.LetterOrDigit)
- Currently, the "easy" option for this is to call
str.Replace
a bunch of times, each with a 1-length string to be replaced withString.Empty
, but this allocatesn-1
intermediate strings.
Would it be possible to optimize the original Replace
method, instead of introducing new APIs?
- Currently, the "easy" option for this is to call
str.Replace
a bunch of times, each with a 1-length string to be replaced withString.Empty
, but this allocatesn-1
intermediate strings.Would it be possible to optimize the original
Replace
method, instead of introducing new APIs?
The original methods are optimized, it's just that because string
must be immutable the result of that call has to be a complete string. Which is where the problem is - you end up with a copy for each "step". The only way around is to create a new method/overload that's equivalent to the signature proposed here either way.
- Currently, the "easy" option for this is to call
str.Replace
a bunch of times, each with a 1-length string to be replaced withString.Empty
, but this allocatesn-1
intermediate strings.Would it be possible to optimize the original
Replace
method, instead of introducing new APIs?The original methods are optimized, it's just that because
string
must be immutable the result of that call has to be a complete string. Which is where the problem is - you end up with a copy for each "step". The only way around is to create a new method/overload that's equivalent to the signature proposed here either way.
Sorry, I didn't express myself well there. I wanted "without changing the API" to mean "without introducing new method names" on this one. For example, keeping the "Replace" name, but providing overloads to perform multiple replacements at once since the issue appears to be that each call today is limited to a single replacement.
Background and motivation
There are times when you get some string and want to normalize/sanitize it in a way that certain characters are removed (or conversely, the result string consists only of some set of characters).
str.Replace
a bunch of times, each with a 1-length string to be replaced withString.Empty
, but this allocatesn-1
intermediate strings.Additionally/Alternatively, a more secure method would be one where the user specifies an allow-list of chars. This would prevent the hassle of trying to exclude everything except the (usually smaller) set you want.
new string(str.Where(c => set.Contains(c)).ToArray())
, but performing LINQ on strings is kinda ew. And the closure and array allocations also don't help.The current alternatives for these is hand-writing an appropriate loop, but that would be more involved and potentially error-prone (especially if you wanted to have these operations vectorized).
cc: @MihaZupan had some insight about this on Discord
API Proposal
params
is maybe not strictly needed, but I thought it would have some value in ease-of-use.ROS<char>
overloads stayparams
, the additional overloads would have to take in the culture in front of it. And that would lead to taking in the culture in front of theSearchValues
in those overloads for consistency.API Usage
Alternative Designs
No response
Risks
A slight increase in
String
's method table? They could be defined as extension methods if that's a genuine concern.Additional notes:
I can imagine that some parts higher up the stack that also deal with strings could create their own extension "overloads" taking in their own complex type for convenience. This would be up to the discretion of the relevant area owner, but for a concrete example:
System.Text.Encodings.Web
could add something likeso that users can piggyback off of the
UnicodeRange
type if their project already references it anyway.