golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.92k stars 17.52k forks source link

proposal: container: add tree-based OrderedMap/OrderedSet #60630

Open leaxoy opened 1 year ago

leaxoy commented 1 year ago

There are at least several thousand related implementations OrderedMap/OrderedSet, although we have generics, there are no related containers in the standard library, so people have to implement their own sorting containers.

So I propose add ordered.Map/Set into std, maybe container package is a suitable place.

The main API are:


type Map[K, V any] struct{ /* private fields */ }

// Constructor function
func NewMapFunc[K, V any](cmp func(K, K) int) *Map[K, V] {}
func NewMap[K cmp.Ordered, V any]() *Map[K, V] {}

// Method set
// Insert new pair and return previous value if present.
func (m *Map[K, V]) Insert(k K, v V) (V, bool) {}
// Get get value associated with k, else return empty and false.
func (m *Map[K, V]) Get(k K) (V, bool) {}
// Contains judge k in OrderedMap ?
func (m *Map[K, V]) Contains(k K) bool {}
// Delete entry by key and return stored value if present.
func (m *Map[K, V]) Delete(k K) (V, bool) {}
// Keys return all keys in ascending order.
func (m *Map[K, V]) Keys() iter.Seq[K] {}
// Values return all values in ascending order by key.
func (m *Map[K, V]) Values() iter.Seq[V] {}
func (m *Map[K, V]) Entries() iter.Seq2[K, V] {}
// Clear clear the whole map.
func (m *Map[K, V]) Clear() {}
// Retain will keep those entry which satisfy the given predicate function.
func (m *Map[K, V]) Retain(f func(K, V) bool) {}
// Len return number of entries of map
func (m *Map[K, V]) Len() int {}
// First return the minimum entry.
func (m *Map[K, V]) First() (Entry[K, V], bool) {}
// Last return the maximum entry.
func (m *Map[K, V]) Last() (Entry[K, V], bool) {}
// Reverse reverse order of all entries.
func (m *Map[K, V]) Reverse() iter.Seq2[K, V] {}
func (m *Map[K, V]) Reversed() *Map[K, V] {}
// Merge with another OrderedMap, if key exists, replace it
func (m *Map[K, V]) Merge(o *Map[K, V]) {}
// SubMap copy entries between start and end to a new Map.
func (m *Map[K, V]) SubMap(start K, end K) *Map[K, V] {}

// Wrapper of Map with unit value.
type Set[T any] struct{ inner *Map[T, struct{}] }

// Constrcutor function
func NewSetFunc[T any](cmp func(T, T) int) *Set[T] {}
func NewSet[T cmp.Ordered]() *Set[T] {}

// Method set
func (s *Set[T]) Contains(v T) bool {}
func (s *Set[T]) ContainsAll(seq iter.Seq[T]) bool {}
func (s *Set[T]) ContainsAny(seq iter.Seq[T]) bool {}
func (s *Set[T]) Add(v T) {}
func (s *Set[T]) AddSeq(seq iter.Seq[T]) {}
func (s *Set[T]) Delete(elem T) {}
func (s *Set[T]) DeleteSeq(seq iter.Seq[T]) {}
func (s *Set[T]) First() (T, bool) {} // first element or none
func (s *Set[T]) Last() (T, bool) {} // last element or none
func (s *Set[T]) All() iter.Seq[T] {}
// or
func (s *Set[T]) Values() iter.Seq[T] {} // all elements iterator
func (s *Set[T]) Reverse() iter.Seq[T] {} // reverse order iterator
func (s *Set[T]) Reversed() *Set[T] {} // reverse order then write into another set
func (s *Set[T]) Len() int {} 
func (s *Set[T]) Retain(f func(T) bool) {}
func (s *Set[T]) SupersetOf(o *Set[T]) bool {}
func (s *Set[T]) SubsetOf(o *Set[T]) bool {}
func (s *Set[T]) InterSection(o *Set[T]) *Set[T] {}
func (s *Set[T]) UnionInPlace(o *Set[T]) {}
func (s *Set[T]) Union(o *Set[T]) *Set[T] {}
func (s *Set[T]) Difference(o *Set[T]) *Set[T] {}
func (s *Set[T]) SymmetricDifference(o *Set[T]) *Set[T] {}
func (s *Set[T]) SubSet(start T, end T) *Set[T] {}

### Appendix A
1. In C++, there are [ordered_set/ordered_set](https://en.cppreference.com/w/cpp/container) based on `rb-tree`
2. In Java, there are [SortedMap](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/SortedMap.html) and [SortedSet](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/SortedSet.html).
3. In Rust, there are [BTreeMap](https://doc.rust-lang.org/std/collections/struct.BTreeMap.html#) and [BTreeSet](https://doc.rust-lang.org/std/collections/struct.BTreeSet.html#)

### Appendix B
If we land [sum type](https://en.wikipedia.org/wiki/Tagged_union) in go, those return a bool flag would be instead by `Option` or `Maybe`, it's intuitive and more suitable. 
Sum type is more suitable than [Product Type](https://en.wikipedia.org/wiki/Product_type) here.

### Appendix C
Since `Iter` should be a separate issue, so this proposal would not include `Iter` api. So move `Iter` API to end of section.
ianlancetaylor commented 1 year ago
  1. We aren't going to add any general purpose containers until we have an idea of how to implement iterators, aka how to find all the elements in the container. Iterators are pending resolution of #56413, which itself is pending the forward/backward compatibility work that will be part of Go 1.21.
  2. This should perhaps be a discussion first.
earthboundkid commented 1 year ago

There's already a set proposal that is snagged on iteration. This seems related to that. This could be containers/ordered.Set and containers/ordered.Map and have overlapping methods with containers/set.

leaxoy commented 1 year ago
  1. We aren't going to add any general purpose containers until we have an idea of how to implement iterators, aka how to find all the elements in the container. Iterators are pending resolution of user-defined iteration using range over func values #56413, which itself is pending the forward/backward compatibility work that will be part of Go 1.21.
  2. This should perhaps be a discussion first.

@ianlancetaylor I'm agree that Iter is a important feature to iterate over all element in container as you explained, but there also many features does't relay on Iter, such as: map.Get/Insert/Delete/Contains/Clear/IsEmpty/First/Last and set.Insert(Many)/Contains(All/Any)/Delete/Clear/IsEmpty. They are useful in many scenarios.

A benefit is map's Key and set's Elem can any type, not limited to cmp.Ordered.

For those internal maybe relay on Iter, like set.SuperSetOf/SubSetOf/Difference/Union/InterSection and map.Merge/Keys/Values, not exposing implementation details can leave room for the introduction of Iter later.

The rest are Iter related, like: Iter can be ecognized by the compiler and can be used in for-loop, and iterate operations can based on this, including but not limited to ForEach/Map/Filter/Reduce/Fold/Max/Min/AllOf/AnyOf/Take/Skip/Cmp/Zip/Unzip/Chunk/GroupBy ...

At least for CPP, operations on containers can be split into several parts, Iterators/Element Access/Capacity/Modifiers/List Operations/Lookup.. at Containers, Iterators works with std::ranges.

If I understand correctly, what you care about above is the Iterators part

petkostas commented 1 month ago

I am curious why an ordered map cannot be introduced or considered. As stated, there are many different implementations of it, which are increasingly being adopted. Still, they are painful to implement when using third-party libraries (especially when those involve range iterations). Because many languages have already adopted a similar functionality (some examples):

To provide a more concrete example, I am using libopenapi, which uses a custom orderedmap, rendering specifications requires the ordering to be in place, especially when visualizing the results. With the current lack of an ordered map solution, a developer must introduce another solution compatible with the native supporting packages (e.g., templates and range functionality).

petkostas commented 1 month ago

Coming back, with the use of yield things become a bit easier. I gave yield spin today over an iteration of an ordered map that previously required me to convert to a slice and loop over the slice and then over the map. I still believe that adding support for an ordered map/set makes sense.

adonovan commented 2 weeks ago

The Keys and Values methods (and a new All method) should definitely use go1.23 iterators rather than materializing a slice.

func (*Map[K, V]) Keys() iter.Seq[K]
func (*Map[K, V]) Values() iter.Seq[V]
func (*Map[K, V]) All() iter.Seq2[K, V]

Reverse too could return an iterator rather than a clone:

func (*Map[K, V]) Reverse() iter.Seq2[K, V]

Similarly, SubMap (renamed Range) could be an iterator for a portion of the key space. If you need to construct a new map from a Range (or All), combine the iterator with a InsertAll method that consumes the iterator.

func (*Map[K, V]) Range(from, to K) iter.Seq2[K, V]
func (*Map[K, V]) InsertAll(iter.Seq2[K, V]) bool

What is Entry? Just a pair? Or a reference to a part of the representation? (Java's Map.Entry took the latter choice and IIRC it was a major barrier to optimized special-purpose maps, because they still had to allocate one object per entry.) Is it necessary?

I like that all the operations return the maximum amount of information in a single access (e.g. Insert reports whether there was a previous entry), as these can really improve efficiency.

Retain should be DeleteFunc, with sense inverted.

We should try to harmonize the APIs of related collections types such as sets, lists, maps, queues, and so on. Java 1.5's collections did a good job of establishing conventions across a range of data types so that, for example, they all have an add(T) bool method that reports whether the size of the collection changed (though the API has grown to the point where I can no longer hold in memory).

Naming suggestion: package ordered; type Map; func NewMap{,Func}? Or package maps; type Ordered; func NewOrdered{,Func}?

jimmyfrasche commented 2 weeks ago

I don't like Range since it clashes with range in a confusing way though I'm not sure what a better name is.

InsertAll should just be Insert because of maps.Insert.

:+1: to container/ordered. maps is for native maps. If there's a type in maps, it should be for wrapping a native map in something with all the usual methods for other container types.

Also, SuperSetOf should be SupersetOf.

DeedleFake commented 2 weeks ago

I feel like container/ordered is ever so mildly confusing because there are lot ordered container types that make no sense in it, such as just a regular linked list, though I see the appeal of ordered.Map[K, V].

jimmyfrasche commented 2 weeks ago

@DeedleFake I would expect inserting into an ordered.List to perform the insertion at an index that maintains sort order. So an ordered.Set that allows duplicates.

leaxoy commented 1 week ago

The Keys and Values methods (and a new All method) should definitely use go1.23 iterators rather than materializing a slice.

func (*Map[K, V]) Keys() iter.Seq[K]
func (*Map[K, V]) Values() iter.Seq[V]
func (*Map[K, V]) All() iter.Seq2[K, V]

Reverse too could return an iterator rather than a clone:

func (*Map[K, V]) Reverse() iter.Seq2[K, V]

Similarly, SubMap (renamed Range) could be an iterator for a portion of the key space. If you need to construct a new map from a Range (or All), combine the iterator with a InsertAll method that consumes the iterator.

func (*Map[K, V]) Range(from, to K) iter.Seq2[K, V]
func (*Map[K, V]) InsertAll(iter.Seq2[K, V]) bool

What is Entry? Just a pair? Or a reference to a part of the representation? (Java's Map.Entry took the latter choice and IIRC it was a major barrier to optimized special-purpose maps, because they still had to allocate one object per entry.) Is it necessary?

I like that all the operations return the maximum amount of information in a single access (e.g. Insert reports whether there was a previous entry), as these can really improve efficiency.

Retain should be DeleteFunc, with sense inverted.

We should try to harmonize the APIs of related collections types such as sets, lists, maps, queues, and so on. Java 1.5's collections did a good job of establishing conventions across a range of data types so that, for example, they all have an add(T) bool method that reports whether the size of the collection changed (though the API has grown to the point where I can no longer hold in memory).

Naming suggestion: package ordered; type Map; func NewMap{,Func}? Or package maps; type Ordered; func NewOrdered{,Func}?

@adonovan

Agreed, when this proposal was made, iterators had not yet been implemented. Now that they have been implemented, it makes sense to use iterator syntax.

The existence of Entry is a personal preference. Since seq2 has never been seen in other languages, tuple is used instead, but since go uses seq2, there is no better reason not to use it here.

Short names are indeed better than long names, changed in original proposal.

In my personal implementation it has indeed been the short name for some time

Although I still think seq2 is a less successful abstraction. ^_^

leaxoy commented 1 week ago

I don't like Range since it clashes with range in a confusing way though I'm not sure what a better name is.

InsertAll should just be Insert because of maps.Insert.

👍 to container/ordered. maps is for native maps. If there's a type in maps, it should be for wrapping a native map in something with all the usual methods for other container types.

Also, SuperSetOf should be SupersetOf.

@jimmyfrasche

Since the native map supports direct insertion of a key-value pair at the syntactic level, but this does not work here. If you insert a sequence using insert, then I cannot think of any other suitable naming for inserting the key-value.

How about use InsertSeq or InsertIter instead.

DeedleFake commented 1 week ago

InsertSeq() fits well with slices.AppendSeq().