Closed rapiz1 closed 3 years ago
Why not name them OSet
and OMap
? I prefer concise naming conventions.
Why not name them
OSet
andOMap
? I prefer concise naming conventions.
Interesting. But I haven't seen these names before so I'm a little worried about that's too subtle. Can you provide some examples from other languages that use these names? Maybe I'm just being naive.
Why not name them OSet and OMap? I prefer concise naming conventions.
I feel as though, historically, we've usually had more regrets about choosing too-terse names than too-descriptive ones. Note that with a descriptive name, a user can always use a type alias to call it something shorter.
Then can a type alias be provided with it? I.E., in the file including the definition of the type, add an official type alias.
It could be, though we usually try to avoid introducing aliases so that people don't have to learn "What's the difference between x and y... oh, nothing?" (whereas if a user adds the alias to their code, it's self-documenting).
Both my comments here should just be interpreted as opinions, not requirements.
@bradcray suggested (under https://github.com/chapel-lang/chapel/issues/15950) making these just an implementation of set
and map
. This idea follows from a similar one about list
https://github.com/chapel-lang/chapel/issues/15913
@Rapiz1 was opposed to the idea and what he said here resonated for me:
OSet/OMap is designed to utilize Treap, SkipList, or some ordered structures(e.g. Red-Black Tree) we may add later, while Map utilizes HashTable. I think the differences like O(1) vs O(lgN), ordered structures vs unordered structures, are significant enough to split up Set and OSet. In addition, possibly we have far more ordered structures(various trees) than unordered structures. It makes more sense to group them together.
This tells me that the ordered-ness of the collection can be provided by different data structures with different characteristics. As such, I see OrderedSet in the same class as List
and Treap in the same class as buffer
and multibuffer
.
In other words if we want to parametrize set
that determines the ordered-ness, than how do we pick the underlying structure? Do we choose between hashtable
, treap
, and skiplist
when we are creating a set
? Then the ordered-ness would be implicit and I don't like that.
I think in general it is a good practice to use the same collection interface that can choose between different data structures, but in this case it would seem too contrived for me, with my current understanding. To me, we can have Set
and OrderedSet
just like we have Block
and Cyclic
for example.
Assuming we go this way: I am against OSet
and OMap
and having such aliases. OrderedSet
and OrderedMap
are really good I think. Not too wordy, tells about what they do clearly etc.
I agree with OrderedSet
and OrderedMap
as names for the type. I wouldn't expect to have to type out the name of the type very often in code using the objects, so I prefer the more descriptive names. And as a reader of someone else's code, the longer names would be more self-documenting.
I don't see the point of making an OrderedSet
be a variant of set
like with List
. If you ever want an ordered set, it's because you're using it in a way that depends on it maintaining the order. Therefore swapping the param argument to get an unordered set won't "just work" with your code. And if you have code using an unordered set, why would you swap to a more expensive ordered set if you're not going to rely on the order? So the benefit of being able to easily swap one for the other doesn't seem like it'd ever actually be a benefit. It's not just a matter of untidiness that one implements more functions than the other -- they have different interfaces so aren't interchangeable.
@bradcray @e-kayrakli @cassella @krishnadey30 I also want to further propose to put orderedSet and orderedMap in one module named Ordered (or something else). They share the underlying implementation and also the hierarchy to chose between implementation. For example, the enum they use to choose implementation should be the same.
@bradcray @e-kayrakli @cassella @krishnadey30 I also want to further propose to put orderedSet and orderedMap in one module named Ordered (or something else). They share the underlying implementation and also the hierarchy to chose between implementation. For example, the enum they use to choose implementation should be the same.
I am not so sure whether that decomposition is the right approach. If you do that, use Ordered
and import Ordered
would bring in both data structures to the scope, whereas only one might be needed. What about a separate module OrderedCommon.chpl
or something (not fond of the name, myself) that includes all the underlying implementation. Then OrderedMap
and OrderedSet
modules doing a private use OrderedCommon
and providing the interface around the common implementation?
My main worry is the precision of use
and import
statements, and if I am missing a way to workaround that, let me know.
If you do that, use Ordered and import Ordered would bring in both data structures to the scope, whereas only one might be needed.
Fair enough. In that case, I prefer to remain the same as before.
I think you can put both modules OrderedMap
and OrderedSet
into one file. Then as long as that file is in the module search path, users can just use OrderedMap
or import OrderedSet
, and the compiler can find it. (At least it does when I add an extra module to Set.chpl
). Then that file could include a private module OrderedCommon
and the other two modules can private use
it as Engin suggests.
I think you can put both modules OrderedMap and OrderedSet into one file. Then as long as that file is in the module search path, users can just use OrderedMap or import OrderedSet, and the compiler can find it.
I don't think that's right. Specifically, if a file Filename.chpl contains a module M.chpl, use M;
I don't believe the module will be found unless something else does a use Filename;
or import Filename;
(which is what I suspect is happening in your Set example, Paul).
I tried to make Treap a submodule of OrderedSet in separate files but came across https://github.com/chapel-lang/chapel/issues/16094
lowerBound(x: eltType, ref result: eltType): bool
same as std::lower_bound(), return false if no occurence.upperBound(x: eltType, ref result: eltType): bool
same as std::upper_bound(), return false if no occurence.predecessor(x: eltType, ref result: eltType): bool
return the predecessor of one element, return false if it's the first one.successor(x: eltType, ref result: eltType): bool
return the successor of one element, return false if it's the last one.I wrote ref result: eltType
because I wanted to support types that can't be default initializable, which are actually non-nilable classes, by avoiding returning result: eltType
, which should be nil when no hit. But I realize that it makes more sense to return a tuple (found: bool, result: eltType)
and just give out a compiler error when trying to call these methods on non-nilable classes.
Any thoughts? @bradcray @e-kayrakli @cassella @krishnadey30
But I realize that it makes more sense to return a tuple (found: bool, result: eltType) and just give out a compiler error when trying to call these methods on non-nilable classes.
This looks way better than returning via a ref
argument to me.
I also tend to prefer returning tuples in Chapel over returning via arguments, at least if the value being returned is fairly intrinsic to the routine's nature. If we were to continue returning these cases by argument for any reason, I think using out
probably would communicate the intent better than ref
(?)
kth(idx: int): return k-th element in the sorted sequence, throw an error if out of range
lowerBound(x: eltType): (bool, eltType) same as std::lower_bound(), return false in the tuple if no occurence.
Have we considered making all of these throw if they are used in a way that cannot return an element?
kth
already does so; could lowerBound
et all also throw if they cannot return an element? That way, the return slot would still be the normal return slot.
Returning a (bool, eltType)
just seems like a poor version of an option
type which I think is something we hope to have at some point (and it might even be written just as eltType?
). It's probably not reasonable to try to do that as part of GSoC but it would be possible in the near term to make a record option
that contained your tuple, so at least the name and concept might match with something we want in the future.
So anyway, my preference is probably to have all of these return an option
type or to throw (and probably in that order but I think both are reasonable).
Regarding this part: init(type eltType, param parSafe = false, param implType: impl) enum setImpl { treap, skipList };
- I know there has been a bunch of discussion that I haven't been very active in - but I generally don't like the idea of parameterizing totally different data structures. (For closely related data structures I am more OK with it, but even there I lean more towards separate types). I think the APIs for treap and skipList might be identical but that we should handle that in the future by making interfaces/constrained generics rather than trying to put all of the data structures into one generic type. The main problem I see with the param
approach here is that to add another implementation, somebody has to modify orderedSet
. Whereas if orderedSet
is an interface/constraint, then anybody can create something that implements orderedSet
. As a result, I would prefer to see treap
and skipList
be separate record
types in separate modules. They can share a helper module if that is useful. The documentation could define an orderedSet
in terms of what functions are available (which would one day become the interface/constrained generic). treap
and skipList
documentation can mostly link to this orderedSet
document.
Regarding this part: init(type eltType, param parSafe = false, param implType: impl) enum setImpl { treap, skipList }; - I know there has been a bunch of discussion that I haven't been very active in - but I generally don't like the idea of parameterizing totally different data structures. (For closely related data structures I am more OK with it, but even there I lean more towards separate types). I think the APIs for treap and skipList might be identical but that we should handle that in the future by making interfaces/constrained generics rather than trying to put all of the data structures into one generic type. The main problem I see with the param approach here is that to add another implementation, somebody has to modify orderedSet. Whereas if orderedSet is an interface/constraint, then anybody can create something that implements orderedSet. As a result, I would prefer to see treap and skipList be separate record types in separate modules. They can share a helper module if that is useful. The documentation could define an orderedSet in terms of what functions are available (which would one day become the interface/constrained generic). treap and skipList documentation can mostly link to this orderedSet document.
The constrained generic approach you outline here would be nice to have in the future.
If we foresee such a future, do you think that we should drop implType
argument as it would be a breaking change when/if we switch to that approach (as far as I can see, that's a future-looking argument in any case and has no use from the first PR)? Do you think there are other issues with the current orderedSet
design in that regard?
If we foresee such a future, do you think that we should drop implType argument as it would be a breaking change when/if we switch to that approach
Yes. Regarding the current PR, if I understand correctly, it would rename the implementation to treap
. orderedSet
would become a documentation-only type (for now) and an interface (in the future).
I think having orderedSet
as an interface is great. But I don't like having a documentation-only type for some time and I have to use Treap
when I actually want orderedSet
. Also, I don't like having a module called Treap
. Though we have Heap
in standard, Treap
is less familiar to some users and even less descriptive than Heap
.
I wonder can we combine these two approaches together? We can introduce something like orderedSetInterface
in the future to enable users to add implementations. In the meantime, for orderedSet
, parameters can still be used to quickly switch between built-in implementations. OrderedSet
can also be a place to group Treap
, SkipList
together, and give them a more descriptive name. No breaking change is involved.
Have we considered making all of these throw if they are used in a way that cannot return an element? kth already does so; could lowerBound et all also throw if they cannot return an element? That way, the return slot would still be the normal return slot.
Oh I forgot to update that. kth
actually returns a tuple now.
in the near term to make a record option that contained your tuple
Can you describe more about the record option
or link to some discussion? I haven't heard of it.
Can you describe more about the record option or link to some discussion? I haven't heard of it.
From #12614
A type modifier is available to indicate a class type is possibly nil. This proposal follows the Swift syntax for these. (see https://developer.apple.com/documentation/swift/optional ). We expect that this syntax will eventually be generalized to create optional record types (or optional int, etc) however the focus of this proposal is on the nilability of class instances.
You could look at Swift's optional
type for some motivation. But a sketch of a prototype of it would be:
record optional {
type valueType;
var value: valueType;
var isNone: bool;
proc init(type valueType) {
this.valueType = valueType;
// default initializes this.value -- in the future we would avoid that
this.isNone = true
}
proc init(in value) {
this.valueType = value.type;
this.value = value;
this.isNone = false;
}
}
However a problem with this approach is that then we have to do API design on optional
if we want the orderedSet
interface to have any stability. Even so, I think I'd prefer a prototype optional
to a tuple.
But you didn't answer my first question yet:
Have we considered making all of these throw if they are used in a way that cannot return an element?
But you didn't answer my first question yet:
I needed further information about optional
to give feedback. : )
Have we considered making all of these throw if they are used in a way that cannot return an element?
I generally consider throw
statement as a way to handle errors. Procedures like lowerBound
are more like searching. Is it reasonable to throw an error when a search returns no occurrence?
So I lean to the optional
record. Do you mean this module needs to contain a prototype of the optional
record and return it instead?
OrderedSet was added with #16124, OrderedMap was added with https://github.com/chapel-lang/chapel/pull/16271.
Background
Previously I proposed Treap and SkipList https://github.com/chapel-lang/chapel/issues/15670 https://github.com/chapel-lang/chapel/issues/15658 . After discussing about the unified interface of Vector, I realized Treap and SkipList could also follow this pattern. They have pros&cons but the interfaces are completely identical and the use case is overlapped. These structures can be organized in a wiser way. Also, they should be named in a better way to indicate their usage in programming, rather than names of structures. So I propose
OrderedSet
andOrderedMap
here.Chapel already has
Set
andMap
based on Hash Table, which depends on hashing. What I'm proposing here is dependent on comparing.OrderedSet
OrderedSet
can be based on any fast structures that support querying and inserting in log(n) time, like Skip List and various Search Trees. Instead of a hash function, it requires comparable elements and supports more operations. The interface is identical toSet
, but with additional procedures.kth(idx: int)
: return k-th element in the sorted sequence, throw an error if out of rangelowerBound(x: eltType): (bool, eltType)
same as std::lower_bound(), return false in the tuple if no occurence.upperBound(x: eltType): (bool, eltType)
same as std::upper_bound(), return false in the tuple if no occurence.predecessor(x: eltType,): (bool, eltType)
return the predecessor of one element, return false in the tuple if it's the first one.successor(x: eltType): (bool, eltType)
return the successor of one element, return false in the tuple if it's the last one.init(type eltType, param parSafe = false, param implType: impl)
enum setImpl { treap, skipList };
OrderedMap
We can easily build a kind of map on
OrderedSet
by storing tuples like(key, value)
in the set. Interfaces are identical toMap
except the initialization.init(type eltType, param parSafe = false, param implType: impl)
enum mapImpl { treap, skipList };
I'm not sure about the name of
OrderedMap
. In Java, there is a similar strcuture namedTreeMap
but that's built on Red-Black Tree. We have SkipList option here so it's not appropriate.