Open leaxoy opened 1 year ago
There's already a set proposal that is snagged on iteration. This seems related to that. This could be containers/ordered.Set and containers/ordered.Map and have overlapping methods with containers/set.
- We aren't going to add any general purpose containers until we have an idea of how to implement iterators, aka how to find all the elements in the container. Iterators are pending resolution of user-defined iteration using range over func values #56413, which itself is pending the forward/backward compatibility work that will be part of Go 1.21.
- This should perhaps be a discussion first.
@ianlancetaylor I'm agree that Iter
is a important feature to iterate over all element in container as you explained, but there also many features does't relay on Iter
, such as: map.Get/Insert/Delete/Contains/Clear/IsEmpty/First/Last
and set.Insert(Many)/Contains(All/Any)/Delete/Clear/IsEmpty
. They are useful in many scenarios.
A benefit is map's Key and set's Elem can any type, not limited to
cmp.Ordered
.
For those internal maybe relay on Iter
, like set.SuperSetOf/SubSetOf/Difference/Union/InterSection
and map.Merge/Keys/Values
, not exposing implementation details can leave room for the introduction of Iter
later.
The rest are Iter
related, like: Iter
can be ecognized by the compiler and can be used in for-loop, and iterate operations can based on this, including but not limited to ForEach/Map/Filter/Reduce/Fold/Max/Min/AllOf/AnyOf/Take/Skip/Cmp/Zip/Unzip/Chunk/GroupBy
...
At least for CPP, operations on containers can be split into several parts, Iterators/Element Access/Capacity/Modifiers/List Operations/Lookup
.. at Containers, Iterators
works with std::ranges
.
If I understand correctly, what you care about above is the Iterators
part
I am curious why an ordered map cannot be introduced or considered. As stated, there are many different implementations of it, which are increasingly being adopted. Still, they are painful to implement when using third-party libraries (especially when those involve range
iterations).
Because many languages have already adopted a similar functionality (some examples):
To provide a more concrete example, I am using libopenapi, which uses a custom orderedmap, rendering specifications requires the ordering to be in place, especially when visualizing the results. With the current lack of an ordered map solution, a developer must introduce another solution compatible with the native supporting packages (e.g., templates and range functionality).
Coming back, with the use of yield
things become a bit easier. I gave yield spin today over an iteration of an ordered map that previously required me to convert to a slice and loop over the slice and then over the map.
I still believe that adding support for an ordered map/set makes sense.
The Keys and Values methods (and a new All method) should definitely use go1.23 iterators rather than materializing a slice.
func (*Map[K, V]) Keys() iter.Seq[K]
func (*Map[K, V]) Values() iter.Seq[V]
func (*Map[K, V]) All() iter.Seq2[K, V]
Reverse too could return an iterator rather than a clone:
func (*Map[K, V]) Reverse() iter.Seq2[K, V]
Similarly, SubMap (renamed Range) could be an iterator for a portion of the key space. If you need to construct a new map from a Range (or All), combine the iterator with a InsertAll method that consumes the iterator.
func (*Map[K, V]) Range(from, to K) iter.Seq2[K, V]
func (*Map[K, V]) InsertAll(iter.Seq2[K, V]) bool
What is Entry? Just a pair? Or a reference to a part of the representation? (Java's Map.Entry took the latter choice and IIRC it was a major barrier to optimized special-purpose maps, because they still had to allocate one object per entry.) Is it necessary?
I like that all the operations return the maximum amount of information in a single access (e.g. Insert reports whether there was a previous entry), as these can really improve efficiency.
Retain should be DeleteFunc, with sense inverted.
We should try to harmonize the APIs of related collections types such as sets, lists, maps, queues, and so on. Java 1.5's collections did a good job of establishing conventions across a range of data types so that, for example, they all have an add(T) bool
method that reports whether the size of the collection changed (though the API has grown to the point where I can no longer hold in memory).
Naming suggestion: package ordered; type Map; func NewMap{,Func}
? Or package maps; type Ordered; func NewOrdered{,Func}
?
I don't like Range
since it clashes with range
in a confusing way though I'm not sure what a better name is.
InsertAll
should just be Insert
because of maps.Insert
.
:+1: to container/ordered
. maps
is for native maps. If there's a type in maps
, it should be for wrapping a native map in something with all the usual methods for other container types.
Also, SuperSetOf
should be SupersetOf
.
I feel like container/ordered
is ever so mildly confusing because there are lot ordered container types that make no sense in it, such as just a regular linked list, though I see the appeal of ordered.Map[K, V]
.
@DeedleFake I would expect inserting into an ordered.List
to perform the insertion at an index that maintains sort order. So an ordered.Set
that allows duplicates.
The Keys and Values methods (and a new All method) should definitely use go1.23 iterators rather than materializing a slice.
func (*Map[K, V]) Keys() iter.Seq[K] func (*Map[K, V]) Values() iter.Seq[V] func (*Map[K, V]) All() iter.Seq2[K, V]
Reverse too could return an iterator rather than a clone:
func (*Map[K, V]) Reverse() iter.Seq2[K, V]
Similarly, SubMap (renamed Range) could be an iterator for a portion of the key space. If you need to construct a new map from a Range (or All), combine the iterator with a InsertAll method that consumes the iterator.
func (*Map[K, V]) Range(from, to K) iter.Seq2[K, V] func (*Map[K, V]) InsertAll(iter.Seq2[K, V]) bool
What is Entry? Just a pair? Or a reference to a part of the representation? (Java's Map.Entry took the latter choice and IIRC it was a major barrier to optimized special-purpose maps, because they still had to allocate one object per entry.) Is it necessary?
I like that all the operations return the maximum amount of information in a single access (e.g. Insert reports whether there was a previous entry), as these can really improve efficiency.
Retain should be DeleteFunc, with sense inverted.
We should try to harmonize the APIs of related collections types such as sets, lists, maps, queues, and so on. Java 1.5's collections did a good job of establishing conventions across a range of data types so that, for example, they all have an
add(T) bool
method that reports whether the size of the collection changed (though the API has grown to the point where I can no longer hold in memory).Naming suggestion:
package ordered; type Map; func NewMap{,Func}
? Orpackage maps; type Ordered; func NewOrdered{,Func}
?
@adonovan
Agreed, when this proposal was made, iterators had not yet been implemented. Now that they have been implemented, it makes sense to use iterator syntax.
The existence of Entry is a personal preference. Since seq2 has never been seen in other languages, tuple is used instead, but since go uses seq2, there is no better reason not to use it here.
Short names are indeed better than long names, changed in original proposal.
In my personal implementation it has indeed been the short name for some time
Although I still think seq2 is a less successful abstraction. ^_^
I don't like
Range
since it clashes withrange
in a confusing way though I'm not sure what a better name is.
InsertAll
should just beInsert
because ofmaps.Insert
.👍 to
container/ordered
.maps
is for native maps. If there's a type inmaps
, it should be for wrapping a native map in something with all the usual methods for other container types.Also,
SuperSetOf
should beSupersetOf
.
@jimmyfrasche
Since the native map supports direct insertion of a key-value pair at the syntactic level, but this does not work here. If you insert a sequence using insert, then I cannot think of any other suitable naming for inserting the key-value.
How about use InsertSeq
or InsertIter
instead.
InsertSeq()
fits well with slices.AppendSeq()
.
I've been looking into this topic a bit and I'd like to share some of that work. This comment is about the leading ordered map implementation, github.com/google/btree. I think it's relevant to talk about, because if it's great, or maybe even good enough, then it can satisfy the ecosystem's need for an ordered map and there's no compelling reason to add one to the standard library. Python seems to have taken a similar route: there is no ordered map in the Python standard library, but there is at least one high-quality third-party implementation.
The strengths of google/btree (as I'll call it throughout) are:
Its weaknesses are:
Item
type that includes both the key and value, instead of the more typical separate key and value types. That means that to look up a key, for instance, you have to pass an entire item with the key portion set.Last but not least, google/btree uses copy-on-write. One way to think about that is that cloning, instead of copying all at once, happens incrementally. Each clone copies a few of the shared tree nodes during each modification, until both are fully copied and no nodes are shared. Of course, if those modifications never happen, no copying happens either.
Copy-on-write provides a kind of iteration guarantee. Clone a btree, then iterate over the clone while confining all modifications to the original. The iteration will observe a snapshot of the original's items. Furthermore, the iteration is concurrency-safe (though not the individual keys and values), because the tree nodes are never written—they are always copied first.
Is copy-on-write a strength or a weakness? I would say both. It makes some operations, like iteration and keeping a snapshot of a tree, quite inexpensive. But its performance can be surprising if you are not aware of what's happening inside the implementation. For example, even after the iteration over a clone is finished and the clone becomes garbage, the original will still be copied each time a "shared" node is modified; the implementation has no good way to tell that the nodes are no longer shared.
google/btree is arguably good enough to serve as the ecosystem's ordered map of choice. On the other hand, its quirks and lack of ranged delete leave room to argue that there is a place for a more idiomatic implementation.
In this second comment (the first is https://github.com/golang/go/issues/60630#issuecomment-2440277569), I'd like to suggest a modern, idiomatic API for ordered maps. The github.com/jba/omap package is based on Russ Cox's rsc.io/omap but adds a few common operations and an improved range API.
In comparison with github.com/google/btree, this implemention:
If we were to adopt an ordered map implementation, I think it would be a good starting point.
@jba
It does support range-over-func, actually, just not explicitly: https://go.dev/play/p/pc574L-uHyf
@jba Most agree, But why is there a MapFunc type? Can it write to Map[K, V any]
, then use two different constructor variants
func NewMap[K, V any](cmp func(K, K) int) *Map[K, V]
func NewOrderedMap[K Ordered, V any]() *Map[K, V] {
return NewMap[K, V](cmp.Compare[K])
}
why is there a MapFunc type
Map
may be much faster.
@jba, could Map
be an interface?
If any method wants to take another map as an argument, the current two-struct approach is going to make it awkward to implement.
func (m *Map[K, V]) Merge(o *Map[K, V]) {}
Maybe an argument can be make that Map
doesn't need such methods, but if Set
and SetFunc
are implemented on top of them, strong arguments can be made that they should have methods like Union
, Join
etc. It would ideal if Map
and Set
could use the same API style.
@hgl Yes, we could define the interface and have two constructor functions, one for the cmp.Ordered
case and one for the comparison function. That would halve the size of the API.
As for set operations, I think it's better to define them in terms of sorted iteration sequences, as in #70140.
@jba I wonder if you can confirm that github.com/jba/omap is strictly superior to github.com/google/btree?
If not, what are the things that jba/omap
need to improve over google/btree
?
P.S. I am a Go user needing an OrderedMap for backing sparse matrices. I am not trying to troll by pitting libraries against each other, just trying to figure out what is the best library at the moment in the absence of a standard library implementation. I also suggest if we decide to leave OrderedMap outside of the standard library, perhaps we could have a Wiki page or blog post to list the recommended/canonical third-party implementations for common data structures. Currently, information on gonuts is too old, and google search results give very confusing picture, with the top search results containing extremely buggy poorly written code. It was only until I stumbled upon this issue did I find a correct path forward for OrderedMaps.
I am not trying to troll by pitting libraries against each other
On the contrary, I think it's important to compare these packages.
I don't think there is a clear winner. In https://github.com/golang/go/issues/60630#issuecomment-2440277569 I talk in detail about the strengths and weaknesses of google/btree. The main weaknesses are the clumsy API, the lack of range delete, and the lack of iteration guarantees without doing a possibly expensive clone.
Google/btree is probably faster because jba/omap is unoptimized. For example, jba/omap allocates all new nodes on the heap instead of using a sync.Pool. There are other implementation tricks too that I haven't bothered with because the main purpose of jba/omap is to serve as a reference implementation for this proposal.
So I think it depends on your particular use case. From my point of view, I'd prefer you use jba/omap to help shake out bugs and serious performance issues, and perhaps to contribute improvements. I'd say if you're working on a personal or student project, it would be a more forward-looking choice. If you need something for serious production work, then certainly google/btree has a lot more mileage on it.
perhaps we could have a Wiki page or blog post to list the recommended/canonical third-party implementations for common data structures
The Go team talks about this from time to time, but the problems are that it's a lot of work, and also we want to remain impartial.
@jba Thanks for your helpful comparison!
It looks like omap
is preferred except for performance critical cases.
I will be using it, and would report bugs or new ideas that may be useful.
The Go team talks about this from time to time, but the problems are that it's a lot of work, and also we want to remain impartial.
Understood, and thanks to the entire Go team for this great work all these years!
There are at least several thousand related implementations OrderedMap/OrderedSet, although we have generics, there are no related containers in the standard library, so people have to implement their own sorting containers.
So I propose add
ordered.Map/Set
intostd
, maybecontainer
package is a suitable place.The main API are:
Appendix A
rb-tree
Appendix B
If we land sum type in go, those return a bool flag would be instead by
Option
orMaybe
, it's intuitive and more suitable. Sum type is more suitable than Product Type here.Appendix C
Since
Iter
should be a separate issue, so this proposal would not includeIter
api. So moveIter
API to end of section.