Open baronfel opened 7 years ago
This is a glaring gap in the language, and workarounds are not nice. These functions are used all the time and should be standard.
A common problem when translating these functions to an F# friendly version is that they have many overloads with different options, typically if the string is case sensitive or not, the culture and so on. Then the next question is what will be the default, if we use the Current Culture they will not be referential transparent.
Nested modules could be used to organize the functions to get around some of the overload issues, although iirc the F# style guide recommends against them.
[<RequireQualifiedAccess>]
module String =
open System
let compare str1 str2 = String.Compare(str1,str2)
module Ordinal =
let compare str1 str2 = String.Compare(str1,str2,StringComparison.Ordinal)
module OrdinalIgnoreCase =
let compare str1 str2 = String.Compare(str1,str2,StringComparison.OrdinalIgnoreCase)
module InvariantCulture =
let compare str1 str2 = String.Compare(str1,str2,StringComparison.InvariantCulture)
I think the sensible approach is to use static extension methods for all of String
's instance methods with overloads, and a curried module function for the instance and static methods without overloads.
You could add the simplest/best method overload as a curried function. I can't see harm in that though perhaps many would argue that convenience is a very bad price for less transparent naming. While String.ordinal would have obscure semantics, String.cultureInvariant would not: so I don't think useful defaults is any very harmful loss of transparency.
Or you could also uniformly translate methods with a selector into curried functions with selector first parameter and D.U. selector. You could combine the two: String.compare: string->string->bool String.compareWithOpt: String.Comparison -> string -> string -> bool
Extra parameters could be put into the D.U. or wrapped in Option.
Merits: All these things would encourage transparency by not propagating nulls. Autocomplete inspection of String would give you everything uniformly Novice F# users not familiar with .NET would see a more usable language. Maybe this is irrelevant, since novice users not familiar with .NET would never use F#? People used to .NET could go on using .NET methods.
I guess the array/list thing has no nice resolution except to keep .NET practices. Personally I'd love to have a String.split function that used lists throughout for uniformity.
My biggest reason for wanting everything, in some form, as curried functions is that then you can learn the language without friction. Those who are already very familiar with .NET methods don't have this issue.
My biggest reason for wanting everything, in some form, as curried functions is that then you can learn the language without friction. Those who are already very familiar with .NET methods don't have this issue.
There are hundreds of .NET APIs. Would the same apply to DateTime, doe example? TimeSpan? Guid? Why just String? And why would this be addressed in FSharp.Core instead of some additional library (FSharpx-like?)
I think there's a good case for more string
functions in the core library. Using APIs of .NET telemetry, I've added a percentage number that represents the percentage of applications which use the corresponding System.String
method/property, according to ApiPort telemetry:
empty : string - 47.3%
isEmpty : string -> bool - 50.1%
isWhitespace : string -> bool - 21.4%
replace : string -> string -> string -> string - 32.1%
startsWith/endsWith : string -> bool - 25.4%
split : seq<char> -> string -> seq<string> - 31.1%
toUpper/toLower(Invariant) : string -> string - 12.3%/22.7%
trim : string -> string - 29.4%
trimStart/trimEnd : string -> string - 12.5%/15.5%
That's fairly significant usage which currently has no "fluent F#" options outside of FSharpx.
I won't go into individual member usage for DateTime
, Guid
, and TimeSpan
, but the overall usage of those types are 47.3%
, 35.2%
, and 36.9%
, respectively. Compare this with 83.5%
for String
.
There are hundreds of .NET APIs. Would the same apply to DateTime, doe example? TimeSpan? Guid? Why just String? And why would this be addressed in FSharp.Core instead of some additional library (FSharpx-like?)
I'd like more stuff added to core, but there are tradeoffs here and String is the most glaring lack. String is more a core datatype than DateTime and deficiencies in String are particularly obvious to those learning the language from start.
The fact that join is available but not split is a particular anomaly.
@cartermp Good stats, thanks.
I'm OK with the specific list above, especially since it is determined by data rather than adhoc design.
However it's a very slippery slope. At some early point F# developers just need to learn how to call .NET APIs. The more you delay it, the more you build up the expectation that you can do more and more without doing that, and the more APIs you end up re-creating.
Strings also have the whole huge issue with language culture. Historically we've only ever put culture-invariant operations in FSharp.Core, and nothing that relies on CurrentCulture. All of the above look invariant, correct?
I think split
might be the most awkward one, since typical usage is with char
and not seq<char>
, but I suppose the "overloads" here are another discussion point - it would be awkward to have an F# function for one commonly-used overload, but no F# function for another commonly-used overload. I think that if we consider culture and the various overloads, it increases in size by quite a lot, but I find that to be acceptable.
This point:
At some early point F# developers just need to learn how to call .NET APIs. The more you delay it, the more you build up the expectation that you can do more and more without doing that, and the more APIs you end up re-creating.
Is a salient one. I think strings are so ubiquitous that they can warrant a bit of exception, but generally speaking I agree that we don't want FSharp.Core to be a wrapper for .NET.
@cartermp Just to confirm that I like your list and would be happy to see a set of additions along those lines, based on that methodology, subject to RFC etc.
@cartermp Another thing to consider with split
is what's the correct name-functionality?
I mean, there are many ways to split a string, actually there is a whole Haskell library dedicated to handle all the different ways of splitting a string, here's is port I did.
I think the name-functionality of the .NET function is a bit unfortunate: it splits on any separator and that's not obvious by the very generic name.
That functionality correspond to splitOneOf but I would call it something like splitOnAny
.
If you ask me what a function named split
should do, I would say that it splits when it finds a sequence of elements specified on the first parameter, this is called splitOn
on that libray.
Should we stick to the poor naming decision made in the .NET framework many years ago?
We can think of a better name and signature for this function and also consider that a similar function might be added later working on lists, arrays or seqs and we would like them to be coherent with the existing one for strings.
@gmpl I don't think we should be in the business of changing the functionality, at least not for this particular issue. The intention here is to just have nice F# wrappers over common .NET utilities which are awkward due to the lack of partial application.
@cartermp I'm fine with the functionality, but not with the name. It's too generic for that very specific functionality.
split
splitting on any parameter isn't as precise in the naming as it could be, but I don't find it surprising that if I pass in multiple characters, it will split on any of those. I passed in multiple characters, after all.
I think the larger question is this: Should we be in the business of being more precise than .NET, or should we simply offer a functional approach that's in the spirit of .NET? I believe that the latter is the best approach, particularly given the corpus of material out there on .NET APIs and their behaviors. The dynamic in this case - offering a nice wrapper around .NET APIs rather than an alternative to .NET APIs - is why I have that opinion. I'm curious about what others thing, though.
There are two different motivations here. One is for functional
.NET wrappers so that those familiar with .NET can transition to a
smoother functional experience. The other is for "small but
adequate core functionality" within a wholly functional world.
The two would lead to different functions, and care would be
needed to differentiate the two if they coexist. But at the moment
we have neither.
My own view in this case is that with Split mimicing .NET
nomenclature and use is quite awkward, and simple functional split
(with different names to differentiate from .NET perhaps) would be
a better choice. This would be a different wrapper using the same
implementation. But I don't feel this strongly and maybe it would
go against practice elsewhere.
There are a very many different ways in which one might
reasonably define a simple functional split. That need not prevent
one from being chosen since any of them would be better than none.
@cartermp I don't feel like we are going in the direction of respecting every single name from old .NET apis, F# has its own names, for instance we have (luckily) List.map
even String.map
instead of Select
.
You take the split
name because for string and chars feels like natural but then on lists someone adds a split
function that does something else.
Later there are user voices claiming to unify the names but it's too late to do that without breaking changes.
Regarding the larger question, being more precise that .NET is not going against the spirit of .NET since even in the framework API style and names change as it evolves.
Here's a different, which unifies all cases.
I'm thinking that actually the .NET Split method is generic. Because by using the overload that takes an array of strings we can get both functionalities at the same time.
The string is a sequence of chars, so that's the splitOn
functionality, but we can specify many of them, so that's the splitOneOf
functionality. Additionally there is the SplitOpions parameter.
But then, moving to F# we don't want to use overloads. Then as I see it now there are two reasonable alternatives:
splitOn
and splitOneOf
or whichever names we decide but not the generic name split
, each one with the specific functionality.split
function which takes an array (or a list which is more F#-ish) of strings, and has both functionalities at the same time. In its signature the result type is the same as the separator parameter type.The disadvantage of the latter as @cartermp already noted is that you end up specifying a singleton array most of the times but considering the generic functionality maybe is a good trade-off.
Let's get this moving along with an RFC :) https://github.com/fsharp/fslang-design/pull/186
The RFC is being discussed at https://github.com/fsharp/fslang-design/issues/187, where I am currently arguing for defaulting to StringComparison.Ordinal
in all functions where that is relevant (equals
, startsWith
/endsWith
, indexOf
, and all other string functions that need to compare strings or substrings). If anyone disagrees with that choice, please pop over to that issue and argue against my reasoning. I believe that there are good reasons for making StringComparison.Ordinal
the default, but I'd hate to see a bad default chosen because I missed a better reason against Ordinal. So if anyone cares about this choice and hasn't already looked at the discussion in https://github.com/fsharp/fslang-design/issues/187, then come over there and have your say.
Just to note that the current status of discussion is captured here: https://github.com/fsharp/fslang-design/discussions/187#discussioncomment-1225149
I do think this small number of functions should be added.
Submitted by TheInnerLight on 5/30/2016 12:00:00 AM
53 votes on UserVoice prior to migration
The Core.String module does not provide nearly enough features at present, too often we have to revert to using the the standard .NET string class which both hinders tidy piping and stops us taking advantage of curried args / partial application. I suggest that at least the following functions be added to the string module:
Obviously all of this can easily be achieved by writing simple wrappers to the methods in the .NET string class but if F# is going to have a String module, it ought to be a fully featured one.
Original UserVoice Submission Archived Uservoice Comments