Closed LouisJenkinsCS closed 7 years ago
Updated chpldocs with additional methods (clear()
, isEmpty()
, isFull()
, length
)
One thing that worth thinking about would be allowing grow/shrink on bounded queues. User may choose to do that in some cases, regardless of its cost. On a similar note, 0-sized bounded queues can also be allowed. This would let patterns where users initialize a 0-sized queue and pass it to some initialization function.
Another thing (that we briefly discussed with Louis) is a need for a split
method. This came up when Louis was using unbalanced tree search as a benchmark. See for example: https://github.com/chapel-lang/chapel/blob/master/test/studies/uts/uts_deq.chpl#L220. Here the split
method is used by the user to balance the load across tasks manually. Louis' current implementation balances the load in the sense that dequeues can be satisfied from remote locales if the local storage is depleted, therefore the need for balancing may not be that much, but still a user-controllable way of doing that might be nice. Also, split can be done by a bulk dequeue followed by a bulk enqueue to a new queue. So the implementation of such a method can be just a wrapper around those.
One thing that worth thinking about would be allowing grow/shrink on bounded queues
Growing and shrinking of a distributed bounded queue is not easy without incurring severe performance penalties due to needed synchronization. A bounded queue is to be implemented on top of a distributed array, of which does not support concurrent growing or shrinking. If heavily requested, a distributed dynamically resizing bounded queue can be created, separate from the normal distributed bounded queue, which can be implemented in a way that allows this.
I would like to note that this can be done without using the queue API by creating a new queue of the requested size and appending the old into the new...
var oldQueue = makeBoundedFIFO(100);
for i in 1 .. 100 do oldQueue += i;
var newQueue = makeBoundedFIFO(200);
newQueue += oldQueue;
for i in 101 .. 200 do newQueue += i;
users initialize a 0-sized queue and pass it to some initialization function
Would there be a reason to not to set the bounds to 1 instead?
split can be done by a bulk dequeue followed by a bulk enqueue to a new queue.
I'm against adding split
to the API because the user may want to split into another type of queue.
var coreWorkQueue = makeBalancedQueue(int);
forall i in 1 .. 100 do coreWorkQueue += i;
var splitWork = coreWorkQueue.dequeue(10);
var splitWorkQueue = makeBoundedFIFO(int);
splitWorkQueue += splitWork;
Where coreWorkQueue
'split' into a FIFO queue.
Edit: Fixed first example to makeBoundedFIFO
instead of makeBoundedQueue
This brings up a separate question: Should we even have an interface?
If each queue was implemented independent of an interface, it would raise any and all restrictions. Maybe an unbounded queue should not have a shrink
/grow
method, but perhaps a certain type of bounded queue should. Perhaps the interface should be as simple as enqueue
and dequeue
and we should leave it at that to allow breathing room for other data structures? Perhaps we should have multiple interfaces? If we wanted a Deque, just implementing a Queue
interface is not sufficient, it would also be able to implement a Stack
interface, etc. A load-balanced queue that imposes no FIFO ordering can perhaps implement some List
interface instead, and perhaps abstract work stealing to another interface LoadBalanced
, etc.
There are many ways to proceed with this.
Edit:
To expand on this more, perhaps we could do what Java does, and have hierarchies for this...
Queue
is the base interface, BoundedQueue
inherits from Queue, and DistributedBoundedQueue
inherits from BoundedQueue
, etc. Again, lifts a lot of restrictions.
Edit2:
It would be a lot less complicated than this, but it encapsulates the idea.
Growing and shrinking of a distributed bounded queue is not easy without incurring severe performance penalties due to needed synchronization.
The question is not how it would perform but whether it is a useful functionality. Although you can do that without an API support, it might be nice to have it in the API.
Would there be a reason to not to set the bounds to 1 instead?
Simply because it isn't supposed to be 1 :) It just feels hacky to me.
I'm in favor of utilizing an inheritance hierarchy for the queue set up.
For the enqueue method, would it return false if some elements have been added but not all? Maybe be a little more explicit on that.
For freeze/unfreeze - will we guarantee that all operations initiated prior to the freeze/unfreeze will complete before the state changes? Or is it possible to mess up an operation part way through (say, by freezing the midst of a long enqueue, for instance).
For the enqueue method, would it return false if some elements have been added but not all? Maybe be a little more explicit on that.
That's one of the things I've been debating about. The transactional additions was referring to enqueue
as an addition to the queue, in that either we commit by adding all of the elements, or rollback by not adding any. You're right in that I wasn't clear on what happened when there was not enough space available. I'll edit the description a bit later.
For freeze/unfreeze - will we guarantee that all operations initiated prior to the freeze/unfreeze will complete before the state changes? Or is it possible to mess up an operation part way through (say, by freezing the midst of a long enqueue, for instance).
Another thing that I've also been debating (also under the Open Questions), however I am in favor of making it implementation-specific. If we went ahead with inheritance hierarchy, the specifics for each queue can be noted in the overridden method.
Soon (tomorrow), I will make a change to the API to add some inheritance and I'll announce when I finish (still here to answer questions).
I have revised the entire API to include hierarchy now. Note that quite a few methods are empty (as it is more of an interface). As chpldocs only shows methods that are overriden, there's nothing else I can do about that yet (I can fill it in more if people like the way it is going right now).
Two ideas that I had while reading the updated documents:
An alternative to "bulk remove" in Collection: remove(10) resulting in creating a new collection may be somewhat heavyweight. How about bulk remove returning an array, and having a much-discussed split
returning a collection? This may be a bit confusing, but I don't know..
Do you think there should be Queue.enqueue/Queue.dequeue and similarly Stack.push/Stack.pop as simple wrappers for add/remove. So, a class extending these would override add/remove, yet allow to use enqueue/dequeue and push/pop?
An alternative to "bulk remove" in Collection: remove(10) resulting in creating a new collection may be somewhat heavyweight. How about bulk remove returning an array, and having a much-discussed split returning a collection?
I find this agreeable. An array is definitely the least expensive way to return bulk data, and split
makes more sense to split into a separate Collection
.
Do you think there should be Queue.enqueue/Queue.dequeue and similarly Stack.push/Stack.pop as simple wrappers for add/remove. So, a class extending these would override add/remove, yet allow to use enqueue/dequeue and push/pop?
Sure, that's agreeable. Originally I removed them so that a Queue
and Stack
feel more like a Collection
, but neglected the fact that their iconic methods provide more clarity in determining what that collection is at a glance. For example, if you see c.add(1)
, is it a stack, queue, or a list? You won't know unless you look for the type. Now if you see c.enqueue(1)
or c.push(1)
it becomes obvious.
Overall, I agree to adding them back.
List... A data structure without any particular ordering.
I'm not sure that sentence makes sense. I mean, I would expect a List to have more "ordering" than a stack or a queue - in particular you can access the middle elements. Maybe you mean something different? Or maybe it shouldn't be called "List" ? (Also we already have something called a List in Chapel that is different from what you are building).
I'm generally happy with the outline of the API.
I would expect a List to have more "ordering" than a stack or a queue
In the API, I relaxed the ordering for List
to allow it to be more of an 'umbrella' for any arbitrary data structure that isn't FIFO, LIFO, or even indexable. For example, if a data structure only allows inserting and removing arbitrary elements without supporting indexing as some optimization, or is inherently unable to due to it's design (I.E; a load-balancing data structure where the index of an item is essentially useless as items can move back-and-forth between nodes). I suppose a better term can be devised instead.
Looking at this again, in Java a List is defined as "an ordered collection (also known as a sequence)", so perhaps List should be indexable by definition and I'll need another category to stick a load-balancing non-indexable non-ordered (but high-performance, I.E seeing over 250x performance over a normal, albeit naive, synchronized list implementation at 64 nodes) data structure. I suppose technically I can have it be a List
which ignores the index and returns any arbitrary element and just document it as such.
Does related work have a name for a "list" that returns an arbitrary element? Would "bag" be a better term?
For example: https://en.wikipedia.org/wiki/List_of_data_structures
shows "List" is ordered. And that "bag" is a "multi-set" which isn't far off but might convey the wrong idea.
Maybe we just need to invent a term for a totally unordered collection? Just brainstorming here:
Or, what about just calling it a "collection" and not trying to have a name for it? I mean, don't you have this terminology right now:
So couldn't it just be:
?
If it being unordered is important for its use, should we be looking for a two-word term, like "UnorderedList" or "UnorderedCollection" ?
I still probably like "bag" or "jumble" the best of all of these.
"hoard" and "mound" don't intuitively spark enough understanding of what that structure is to me (maybe because I am not a native speaker)
I actually like the sound of "Collection" the most for the purpose. But I am not sure if it has any implications on the software design aspect. Two-word terms also sound good to me as they are descriptive. I can live with "bag".
Interesting... Yeah, Bag
sounds about right, considering it describes it rather well. DistributedBag
and Bag
for the single-locale version sounds about right. However, I do have List
data structures that are indexable, so I guess it'd be...
Collection
Bag
List
BoundedList
DynamicBoundedList
Queue
BoundedQueue
DynamicBoundedQueue
Stack
BoundedStack
DynamicBoundedStack
In terms of the List
, if a data structure supports indexing, then pop_front
, push_front
, pop_back
, push_back
, etc are provided for free. As well, potentially later on if a Set
interface is added, then Bag
can be moved under it as a Multiset
of sorts.
Also I won't hold any illusion that I can finish everything here by the end of the month, but my goal is to get at least 1 in each category.
One more thing as well, I need to establish what would be 'acceptable' for performance. In the distributed List
implementation (not the Bag
/Multiset
one I discussed earlier), I can promise non-degrading performance, but I can't promise scalable (as in increase-per-node) performance for a list that requires full ordering across nodes, arbitrary/random access, etc. This is better than a normal synchronized list by far as its performance significantly degrades as you add more nodes.
It's implementation would require employment of the H-Synch algorithm where in summary, we maintain a global queue-lock (MCS) [Requires GlobalAtomicObject] which a node needs to acquire before it may touch the data structure, and a CC-Synch lock (CCSynchLock) which a task needs to acquire. The performance would be near-equivalent to 1 node which will hopefully be non-degrading. It does require more effort on the user's behalf though. @mppf Since you have a much firmer grasp on what is and is not acceptable, would a list with non-degrading performance be desired? As well, if this is desired (and if it is achieved), its possible to reuse the pattern/abstraction in other applications.
Due to time constraints, I'm trimming down the Collections API a lot more. Not enough time to update the hosted chpldocs so I'll show it below in text...
class Collection {
type eltType;
/*
Adds an element to this data structure.
*/
proc add(elt : eltType) : bool;
/*
Removes an arbitrary element from this data structure.
BUG: Compiler will segfault if the returned value is not captured at callsite.
'c.remove();' == segfault
BUG: Compiler with LICM will cause undefined behavior if the return value is captured
in a variable declare out of a loop.
'var captured : (bool, eltType); while !captures[1] do captured = remove()' == undefined behavior
*/
proc remove() : (bool, eltType);
// Checks to see if the data structure contains an item. Note we do not support the
// 'removeItem' equivalent by default due to it causing a ton of headache in implementing
// my ordered data structure.
proc contains(elt : eltType) : bool;
// By default calls repeatedly calls 'remove()' until empty if not overloaded. May be
// problematic with issue #7003
proc clear();
// By default checks if 'size() == 0', can be overloaded
inline proc isEmpty() : bool {
halt("'proc isEmpty() : bool' is not supported...");
}
// Returns number of elements.
proc size() : int;
// By default, they must implement serial iteration; I would put Parallel iteration here, but it is
// not possible as the leader iterator's yielded chunk can vary in type. Also its not the easiest
// to debug either (I.E: #6998)
iter these() : eltType;
// Freezes the collection, rendering it immutable.
proc freeze() : bool;
// Unfreezes the collection, rendering it mutable.
proc unfreeze() : bool;
// If this collection can be frozen.
proc canFreeze() : bool;
// If this collection is currently frozen.
proc isFrozen() : bool;
}
Note that there isn't any 'Stack', 'List', or 'Queue' derivative. If they are needed, they can be added later. I plan to submit two scalable data structures (used to be 5, but 3 of which I took the best of and merged into one, and another discarded entirely). Now, they will directly inherit from 'Collection', this should avoid most of other issues with dynamic dispatch as well and make my job a lot easier. I definitely promised more than I originally could deliver, apologies for this sudden change.
Now that the PR has been posted, I have another revision I kind of want to make. That would be potentially removing 'freeze' logic from the API entirely. Turns out its a lot more tedious and harder to implement than originally expected. Right now, all data structures allow for iteration during concurrent access, so 'freeze' currently merely elides the need to acquire any locks. Worst of all, the DistributedBag
sees a large 33% performance loss from the overhead of these checks, so perhaps I should remove it? Since I originally brought up the idea and no one voiced objections to it, I'd need to see if people would want it or not.
I don't have any objection to removing the "freeze" logic. Such logic complicates the API and is only really useful if it offers some performance advantage, in my opinion.
Cool! I'll assume most other people would feel the same way. As well, since the 'resource leakage on breaking out of iterators' is a confirmed bug, I can probably make a warning that the user cannot break out of serial iterators early without consequence.
Data Structure API
Looking for feedback and review on the API for Data Structures, chpldocs hosted here. Below are questions that help determine the overall direction of further development of the queue.
Edit: The chpldocs link is now updated. Old docs are here.
Edit2: I have removed the
QueueFactory
entirely for now.Edit3: Generalizing for any kind of data structures.
## The QueueFactoryEmulates Go's 'make'.### Open Questions* Should we maintain an interface at all?* If not, then should we make each data structure arecord
instead?* Resilience to Change vs FlexibilityProposed Additions
'Freezing' Data Structures
While parallel-safe data structures are desired, not all things can be done in a parallel-safe manner. For these situations, I believe the data structures should be armed with two states: Mutable (unfrozen) and Immutable (frozen). The benefit is two-fold, as it enables optimizations such as adopting a more optimistic manner of accessing data by not contesting for locks meaning more performance, and the user has the benefit of enforcing immutability across nodes without need for external synchronization. I believe that this is the only 'safe' way to allow things like reduction, zipper, and mapping iterations without modifying the queue while providing a significant performance boost.
Open Questions
Iteration
Iteration allows for the data structures to become more than just a simple containers, making things such as reductions, zipper-iterables, and mappings possible.
Open Questions
Transactional Additions
Adding in bulk 'all-or-nothing' transactional approach.
Open Questions
Operator Overloading
While some operators can be helpful, others can be rendered irrelevant. Syntactic sugar operators such as
+=
can help produce more readable code and are easy to overload. Perhaps the=
operator can be overloaded to perform a 'clear-and-add' kind of operation? What say you?Edit: Removed the 'literal' part of example.