Add (binary) search function to Map and Set

baronfel commented 8 years ago

Submitted by Anthony Lloyd on 3/25/2016 12:00:00 AM
3 votes on UserVoice prior to migration

Often in interpolation the nearest collection items to a given value are needed. Map and Set both have a data structure that could be binary searched returning the two nearest neighbours optionally. The function should only be called something like 'nearest' so the implementation isn't exposed. At the moment you would have to replicate the collections to be able to do this.

Original UserVoice Submission Archived Uservoice Comments

travis-leith commented 3 years ago

I would like to renew this issue. Seems like it would be a low effort implementation, right? I think this would be a very useful feature to have. For example, I have a map of sparse dates to prices, and I want to get the price of something in effect on some date, it would be so simple with this proposal. I get that the reason F# uses binary maps instead of hash maps is because binary maps make it possible to add items and return a new map which mostly uses the same memory as the input map. But given that we have binary maps, we should take advantage of the other benefits enabled by this structure, such as this proposal.

A closely related feature I would also like to see is Map.SplitAt: Map<'key, 'value> -> 'key -> Map<'key, 'value> * Map<'key, 'value> which splits a map into 2 maps at the point where the key is (or would be if it doesn't exist). This can currently be accomplished using Map.partition but that inefficiently iterates through every element, applying a predicate.

dsyme commented 3 years ago

Marking as approved in principle

That doesn't mean it's a priority - indeed if you want this I strongly encourage you to contribute and RFC and an implementation - the sooner you do the sooner it will happen :-)

reinux commented 1 year ago

I'm down to work on this. This would be super useful to have in my current project, and it'll be a fun diversion.

For names, how about Map.findNearest and Map.findNearestBack, in the spirit of Seq.find and Seq.findBack?

Edit: I'll also do the try versions as well.

Edit 2: On second thought, nearest might suggest that 3 is closer to 5 than 10 even though we mean 10.

T-Gro commented 1 year ago

I think the BinarySearch name is well-established, Array class in dotnet has it as well.

About the try/nearest semantics:

The are multiple result situations which would IMO be clearest modelled as a dedicated struct DU for the various choices. (exact match, and 3 types of not found - smaller then min, larger then max, between two elements)

The built-one for dotnet Array does that by encoding this information into a single integer - but that would not work for Map/Set anyway, because you cannot use an integral index. https://learn.microsoft.com/en-us/dotnet/api/system.array.binarysearch?view=net-8.0

travis-leith commented 1 year ago

I'm down to work on this. This would be super useful to have in my current project, and it'll be a fun diversion.

For names, how about Map.findNearest and Map.findNearestBack, in the spirit of Seq.find and Seq.findBack?

Edit: I'll also do the try versions as well.

Edit 2: On second thought, nearest might suggest that 3 is closer to 5 than 10 even though we mean 10.

Re your second edit, maybe consider the Map.splitAt suggestion I made earlier. It returns 2 collections (possibly linked lists of key value pairs with the left side ascending and the right side descending, with the nearest elements at the head of each list). Up to the user to pick which ever element is desired.

reinux commented 1 year ago

Both points well taken.

If we use a single BinarySearch function as opposed to two separate ones, maybe we can return a tuple of three options: one for the closest match below, one for an exact match, and one for the match above?

T-Gro commented 1 year ago

Both points well taken.

If we use a single BinarySearch function as opposed to two separate ones, maybe we can return a tuple of three options: one for the closest match below, one for an exact match, and one for the match above?

I have a subjective feeling that while that would work fine, it does not explicity force the consuming code into the correct pattern matching on the result. Hence more a fan of explicitly modelled choices for legal states of the response.

(tuple of 3 options means 8 possible states - but not every one of them would be valid)

reinux commented 1 year ago

Both points well taken. If we use a single BinarySearch function as opposed to two separate ones, maybe we can return a tuple of three options: one for the closest match below, one for an exact match, and one for the match above?

I have a subjective feeling that while that would work fine, it does not explicity force the consuming code into the correct pattern matching on the result. Hence more a fan of explicitly modelled choices for legal states of the response.

(tuple of 3 options means 8 possible states - but not every one of them would be valid)

Ah, I understand what you mean now: if there's an exact match, anything higher or lower shouldn't matter, so we can eliminate 4 entire patterns.

I think it might depend a bit on the usage. For example, if the keys are non-continuous IDs, as in blog entry IDs or something, and we want to be able to traverse the tree statelessly (i.e. without an iterator), it would be useful to have possible higher and lower items even if an exact match is also found.

@travis-leith 's splitAt suggestion would probably suit that use case better as it would be more general, though I'm still going over a couple things in my head:

I think returning a Map * Option * Map would be more efficient than creating a List * Option * List, though there'd probably be some balancing that needs to be done, at which point, it might end up being at least an O(n) operation anyway?
Seq always feels a little bit precarious (and uglier with pattern matching), but maybe a Seq * Option * Seq would make sense instead as a compromise...

krauthaufen commented 1 year ago

Btw we have all of these combinators in FSharp.Data.Adaptive in case you need implementations or inspiration: https://github.com/fsprojects/FSharp.Data.Adaptive/blob/master/src/FSharp.Data.Adaptive/Datastructures/MapExt.fs#L1424

tomcl commented 5 months ago

This does seem tricky functionality to get a clean interface too: so many different ways.

How about:

searchLargestBelow: 'Key -> 'Key option
searchSmallestAbove: 'Key -> 'Key option`

Note that if the search key exists these two functions will not return it - but then searching is what you would do when the key does not exist, and checking for existence is easy.

Together, these serve for stateless iteration and flexible searching. searchNearest can be got from these two functions, where 'Key has subtraction defined on it. I don't think we can define searchNearest in general because IComparer does not define subtraction.

In terms of performance all are log n. There are use cases for returning upper and lower bounds together, but also maybe more use cases for returning just one of them, so I don't think these are necessarily worse for performance than a function with a more complex return result.

Unfortunately the (common?) use case where wanted is wanted is the nearest key is not very well served by this - that requires 3 log n searches: one to detect is the key in the Set (or Map) then possibly two more for the two bounds. Making the bounds include equality allows better searchNearest but then means these functions cannot be used to do faster iteration.

or as above:

search: 'Key -> 'Key option * 'Key option * 'Key option

Returns the key if there is an exact match AND the zero, one, or two non-equal bounds.

Neither of these seems great to me, and I cannot easily choose between them?

fsharp / fslang-suggestions

Add (binary) search function to Map and Set #82