Open baronfel opened 8 years ago
I would like to renew this issue. Seems like it would be a low effort implementation, right? I think this would be a very useful feature to have. For example, I have a map of sparse dates to prices, and I want to get the price of something in effect on some date, it would be so simple with this proposal. I get that the reason F# uses binary maps instead of hash maps is because binary maps make it possible to add items and return a new map which mostly uses the same memory as the input map. But given that we have binary maps, we should take advantage of the other benefits enabled by this structure, such as this proposal.
A closely related feature I would also like to see is
Map.SplitAt: Map<'key, 'value> -> 'key -> Map<'key, 'value> * Map<'key, 'value>
which splits a map into 2 maps at the point where the key is (or would be if it doesn't exist). This can currently be accomplished using Map.partition
but that inefficiently iterates through every element, applying a predicate.
Marking as approved in principle
That doesn't mean it's a priority - indeed if you want this I strongly encourage you to contribute and RFC and an implementation - the sooner you do the sooner it will happen :-)
I'm down to work on this. This would be super useful to have in my current project, and it'll be a fun diversion.
For names, how about Map.findNearest
and Map.findNearestBack
, in the spirit of Seq.find
and Seq.findBack
?
Edit: I'll also do the try
versions as well.
Edit 2: On second thought, nearest
might suggest that 3
is closer to 5
than 10
even though we mean 10
.
I think the BinarySearch name is well-established, Array class in dotnet has it as well.
About the try/nearest semantics:
The are multiple result situations which would IMO be clearest modelled as a dedicated struct DU for the various choices. (exact match, and 3 types of not found - smaller then min, larger then max, between two elements)
The built-one for dotnet Array does that by encoding this information into a single integer - but that would not work for Map/Set anyway, because you cannot use an integral index. https://learn.microsoft.com/en-us/dotnet/api/system.array.binarysearch?view=net-8.0
I'm down to work on this. This would be super useful to have in my current project, and it'll be a fun diversion.
For names, how about
Map.findNearest
andMap.findNearestBack
, in the spirit ofSeq.find
andSeq.findBack
?Edit: I'll also do the
try
versions as well.Edit 2: On second thought,
nearest
might suggest that3
is closer to5
than10
even though we mean10
.
Re your second edit, maybe consider the Map.splitAt
suggestion I made earlier. It returns 2 collections (possibly linked lists of key value pairs with the left side ascending and the right side descending, with the nearest elements at the head of each list). Up to the user to pick which ever element is desired.
Both points well taken.
If we use a single BinarySearch
function as opposed to two separate ones, maybe we can return a tuple of three options: one for the closest match below, one for an exact match, and one for the match above?
Both points well taken.
If we use a single
BinarySearch
function as opposed to two separate ones, maybe we can return a tuple of three options: one for the closest match below, one for an exact match, and one for the match above?
I have a subjective feeling that while that would work fine, it does not explicity force the consuming code into the correct pattern matching on the result. Hence more a fan of explicitly modelled choices for legal states of the response.
(tuple of 3 options means 8 possible states - but not every one of them would be valid)
Both points well taken. If we use a single
BinarySearch
function as opposed to two separate ones, maybe we can return a tuple of three options: one for the closest match below, one for an exact match, and one for the match above?I have a subjective feeling that while that would work fine, it does not explicity force the consuming code into the correct pattern matching on the result. Hence more a fan of explicitly modelled choices for legal states of the response.
(tuple of 3 options means 8 possible states - but not every one of them would be valid)
Ah, I understand what you mean now: if there's an exact match, anything higher or lower shouldn't matter, so we can eliminate 4 entire patterns.
I think it might depend a bit on the usage. For example, if the keys are non-continuous IDs, as in blog entry IDs or something, and we want to be able to traverse the tree statelessly (i.e. without an iterator), it would be useful to have possible higher and lower items even if an exact match is also found.
@travis-leith 's splitAt
suggestion would probably suit that use case better as it would be more general, though I'm still going over a couple things in my head:
Map * Option * Map
would be more efficient than creating a List * Option * List
, though there'd probably be some balancing that needs to be done, at which point, it might end up being at least an O(n) operation anyway?Seq
always feels a little bit precarious (and uglier with pattern matching), but maybe a Seq * Option * Seq
would make sense instead as a compromise...Btw we have all of these combinators in FSharp.Data.Adaptive in case you need implementations or inspiration: https://github.com/fsprojects/FSharp.Data.Adaptive/blob/master/src/FSharp.Data.Adaptive/Datastructures/MapExt.fs#L1424
This does seem tricky functionality to get a clean interface too: so many different ways.
How about:
searchLargestBelow: 'Key -> 'Key option
searchSmallestAbove
: 'Key -> 'Key option`Note that if the search key exists these two functions will not return it - but then searching is what you would do when the key does not exist, and checking for existence is easy.
Together, these serve for stateless iteration and flexible searching. searchNearest
can be got from these two functions, where 'Key has subtraction defined on it. I don't think we can define searchNearest
in general because IComparer
does not define subtraction.
In terms of performance all are log n. There are use cases for returning upper and lower bounds together, but also maybe more use cases for returning just one of them, so I don't think these are necessarily worse for performance than a function with a more complex return result.
Unfortunately the (common?) use case where wanted is wanted is the nearest key is not very well served by this - that requires 3 log n searches: one to detect is the key in the Set (or Map) then possibly two more for the two bounds. Making the bounds include equality allows better searchNearest
but then means these functions cannot be used to do faster iteration.
or as above:
search: 'Key -> 'Key option * 'Key option * 'Key option
Returns the key if there is an exact match AND the zero, one, or two non-equal bounds.
Neither of these seems great to me, and I cannot easily choose between them?
Submitted by Anthony Lloyd on 3/25/2016 12:00:00 AM
3 votes on UserVoice prior to migration
Often in interpolation the nearest collection items to a given value are needed. Map and Set both have a data structure that could be binary searched returning the two nearest neighbours optionally. The function should only be called something like 'nearest' so the implementation isn't exposed. At the moment you would have to replicate the collections to be able to do this.
Original UserVoice Submission Archived Uservoice Comments