chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.8k stars 423 forks source link

[Review] Data Structure API #6812

Closed LouisJenkinsCS closed 7 years ago

LouisJenkinsCS commented 7 years ago

Data Structure API

Looking for feedback and review on the API for Data Structures, chpldocs hosted here. Below are questions that help determine the overall direction of further development of the queue.

Edit: The chpldocs link is now updated. Old docs are here.

Edit2: I have removed the QueueFactory entirely for now.

Edit3: Generalizing for any kind of data structures.

## The QueueFactory

Emulates Go's 'make'.

### Open Questions

* Should we maintain an interface at all? * If not, then should we make each data structure a record instead? * Resilience to Change vs Flexibility

Proposed Additions

'Freezing' Data Structures

While parallel-safe data structures are desired, not all things can be done in a parallel-safe manner. For these situations, I believe the data structures should be armed with two states: Mutable (unfrozen) and Immutable (frozen). The benefit is two-fold, as it enables optimizations such as adopting a more optimistic manner of accessing data by not contesting for locks meaning more performance, and the user has the benefit of enforcing immutability across nodes without need for external synchronization. I believe that this is the only 'safe' way to allow things like reduction, zipper, and mapping iterations without modifying the queue while providing a significant performance boost.

Open Questions

Iteration

Iteration allows for the data structures to become more than just a simple containers, making things such as reductions, zipper-iterables, and mappings possible.

Open Questions

Transactional Additions

Adding in bulk 'all-or-nothing' transactional approach.

Open Questions

Operator Overloading

While some operators can be helpful, others can be rendered irrelevant. Syntactic sugar operators such as += can help produce more readable code and are easy to overload. Perhaps the = operator can be overloaded to perform a 'clear-and-add' kind of operation? What say you?

var queue = makeBoundedFIFO(int, 100);
queue = (1,2,3,4,5);
for i in 6 .. 100 do queue += i;

Edit: Removed the 'literal' part of example.

LouisJenkinsCS commented 7 years ago

Updated chpldocs with additional methods (clear(), isEmpty(), isFull(), length)

e-kayrakli commented 7 years ago
LouisJenkinsCS commented 7 years ago

One thing that worth thinking about would be allowing grow/shrink on bounded queues

Growing and shrinking of a distributed bounded queue is not easy without incurring severe performance penalties due to needed synchronization. A bounded queue is to be implemented on top of a distributed array, of which does not support concurrent growing or shrinking. If heavily requested, a distributed dynamically resizing bounded queue can be created, separate from the normal distributed bounded queue, which can be implemented in a way that allows this.

I would like to note that this can be done without using the queue API by creating a new queue of the requested size and appending the old into the new...

var oldQueue = makeBoundedFIFO(100);
for i in 1 .. 100 do oldQueue += i;
var newQueue = makeBoundedFIFO(200);
newQueue += oldQueue;
for i in 101 .. 200 do newQueue += i;

users initialize a 0-sized queue and pass it to some initialization function

Would there be a reason to not to set the bounds to 1 instead?

split can be done by a bulk dequeue followed by a bulk enqueue to a new queue.

I'm against adding split to the API because the user may want to split into another type of queue.

var coreWorkQueue = makeBalancedQueue(int);
forall i in 1 .. 100 do coreWorkQueue += i;
var splitWork = coreWorkQueue.dequeue(10);
var splitWorkQueue = makeBoundedFIFO(int);
splitWorkQueue += splitWork;

Where coreWorkQueue 'split' into a FIFO queue.

Edit: Fixed first example to makeBoundedFIFO instead of makeBoundedQueue

LouisJenkinsCS commented 7 years ago

This brings up a separate question: Should we even have an interface?

If each queue was implemented independent of an interface, it would raise any and all restrictions. Maybe an unbounded queue should not have a shrink/grow method, but perhaps a certain type of bounded queue should. Perhaps the interface should be as simple as enqueue and dequeue and we should leave it at that to allow breathing room for other data structures? Perhaps we should have multiple interfaces? If we wanted a Deque, just implementing a Queue interface is not sufficient, it would also be able to implement a Stack interface, etc. A load-balanced queue that imposes no FIFO ordering can perhaps implement some List interface instead, and perhaps abstract work stealing to another interface LoadBalanced, etc.

There are many ways to proceed with this.

Edit:

To expand on this more, perhaps we could do what Java does, and have hierarchies for this...

Queue is the base interface, BoundedQueue inherits from Queue, and DistributedBoundedQueue inherits from BoundedQueue, etc. Again, lifts a lot of restrictions.

Edit2:

It would be a lot less complicated than this, but it encapsulates the idea.

image

e-kayrakli commented 7 years ago

Growing and shrinking of a distributed bounded queue is not easy without incurring severe performance penalties due to needed synchronization.

The question is not how it would perform but whether it is a useful functionality. Although you can do that without an API support, it might be nice to have it in the API.

Would there be a reason to not to set the bounds to 1 instead?

Simply because it isn't supposed to be 1 :) It just feels hacky to me.

lydia-duncan commented 7 years ago

I'm in favor of utilizing an inheritance hierarchy for the queue set up.

For the enqueue method, would it return false if some elements have been added but not all? Maybe be a little more explicit on that.

For freeze/unfreeze - will we guarantee that all operations initiated prior to the freeze/unfreeze will complete before the state changes? Or is it possible to mess up an operation part way through (say, by freezing the midst of a long enqueue, for instance).

LouisJenkinsCS commented 7 years ago

For the enqueue method, would it return false if some elements have been added but not all? Maybe be a little more explicit on that.

That's one of the things I've been debating about. The transactional additions was referring to enqueue as an addition to the queue, in that either we commit by adding all of the elements, or rollback by not adding any. You're right in that I wasn't clear on what happened when there was not enough space available. I'll edit the description a bit later.

For freeze/unfreeze - will we guarantee that all operations initiated prior to the freeze/unfreeze will complete before the state changes? Or is it possible to mess up an operation part way through (say, by freezing the midst of a long enqueue, for instance).

Another thing that I've also been debating (also under the Open Questions), however I am in favor of making it implementation-specific. If we went ahead with inheritance hierarchy, the specifics for each queue can be noted in the overridden method.

LouisJenkinsCS commented 7 years ago

Soon (tomorrow), I will make a change to the API to add some inheritance and I'll announce when I finish (still here to answer questions).

LouisJenkinsCS commented 7 years ago

I have revised the entire API to include hierarchy now. Note that quite a few methods are empty (as it is more of an interface). As chpldocs only shows methods that are overriden, there's nothing else I can do about that yet (I can fill it in more if people like the way it is going right now).

e-kayrakli commented 7 years ago

Two ideas that I had while reading the updated documents:

LouisJenkinsCS commented 7 years ago

An alternative to "bulk remove" in Collection: remove(10) resulting in creating a new collection may be somewhat heavyweight. How about bulk remove returning an array, and having a much-discussed split returning a collection?

I find this agreeable. An array is definitely the least expensive way to return bulk data, and split makes more sense to split into a separate Collection.

Do you think there should be Queue.enqueue/Queue.dequeue and similarly Stack.push/Stack.pop as simple wrappers for add/remove. So, a class extending these would override add/remove, yet allow to use enqueue/dequeue and push/pop?

Sure, that's agreeable. Originally I removed them so that a Queue and Stack feel more like a Collection, but neglected the fact that their iconic methods provide more clarity in determining what that collection is at a glance. For example, if you see c.add(1), is it a stack, queue, or a list? You won't know unless you look for the type. Now if you see c.enqueue(1) or c.push(1) it becomes obvious.

Overall, I agree to adding them back.

mppf commented 7 years ago

List... A data structure without any particular ordering.

I'm not sure that sentence makes sense. I mean, I would expect a List to have more "ordering" than a stack or a queue - in particular you can access the middle elements. Maybe you mean something different? Or maybe it shouldn't be called "List" ? (Also we already have something called a List in Chapel that is different from what you are building).

I'm generally happy with the outline of the API.

LouisJenkinsCS commented 7 years ago

I would expect a List to have more "ordering" than a stack or a queue

In the API, I relaxed the ordering for List to allow it to be more of an 'umbrella' for any arbitrary data structure that isn't FIFO, LIFO, or even indexable. For example, if a data structure only allows inserting and removing arbitrary elements without supporting indexing as some optimization, or is inherently unable to due to it's design (I.E; a load-balancing data structure where the index of an item is essentially useless as items can move back-and-forth between nodes). I suppose a better term can be devised instead.

Looking at this again, in Java a List is defined as "an ordered collection (also known as a sequence)", so perhaps List should be indexable by definition and I'll need another category to stick a load-balancing non-indexable non-ordered (but high-performance, I.E seeing over 250x performance over a normal, albeit naive, synchronized list implementation at 64 nodes) data structure. I suppose technically I can have it be a List which ignores the index and returns any arbitrary element and just document it as such.

mppf commented 7 years ago

Does related work have a name for a "list" that returns an arbitrary element? Would "bag" be a better term?

For example: https://en.wikipedia.org/wiki/List_of_data_structures

shows "List" is ordered. And that "bag" is a "multi-set" which isn't far off but might convey the wrong idea.

Maybe we just need to invent a term for a totally unordered collection? Just brainstorming here:

Or, what about just calling it a "collection" and not trying to have a name for it? I mean, don't you have this terminology right now:

So couldn't it just be:

?

If it being unordered is important for its use, should we be looking for a two-word term, like "UnorderedList" or "UnorderedCollection" ?

I still probably like "bag" or "jumble" the best of all of these.

e-kayrakli commented 7 years ago

"hoard" and "mound" don't intuitively spark enough understanding of what that structure is to me (maybe because I am not a native speaker)

I actually like the sound of "Collection" the most for the purpose. But I am not sure if it has any implications on the software design aspect. Two-word terms also sound good to me as they are descriptive. I can live with "bag".

LouisJenkinsCS commented 7 years ago

Interesting... Yeah, Bag sounds about right, considering it describes it rather well. DistributedBag and Bag for the single-locale version sounds about right. However, I do have List data structures that are indexable, so I guess it'd be...

In terms of the List, if a data structure supports indexing, then pop_front, push_front, pop_back, push_back, etc are provided for free. As well, potentially later on if a Set interface is added, then Bag can be moved under it as a Multiset of sorts.

Also I won't hold any illusion that I can finish everything here by the end of the month, but my goal is to get at least 1 in each category.

LouisJenkinsCS commented 7 years ago

One more thing as well, I need to establish what would be 'acceptable' for performance. In the distributed List implementation (not the Bag/Multiset one I discussed earlier), I can promise non-degrading performance, but I can't promise scalable (as in increase-per-node) performance for a list that requires full ordering across nodes, arbitrary/random access, etc. This is better than a normal synchronized list by far as its performance significantly degrades as you add more nodes.

It's implementation would require employment of the H-Synch algorithm where in summary, we maintain a global queue-lock (MCS) [Requires GlobalAtomicObject] which a node needs to acquire before it may touch the data structure, and a CC-Synch lock (CCSynchLock) which a task needs to acquire. The performance would be near-equivalent to 1 node which will hopefully be non-degrading. It does require more effort on the user's behalf though. @mppf Since you have a much firmer grasp on what is and is not acceptable, would a list with non-degrading performance be desired? As well, if this is desired (and if it is achieved), its possible to reuse the pattern/abstraction in other applications.

LouisJenkinsCS commented 7 years ago

Due to time constraints, I'm trimming down the Collections API a lot more. Not enough time to update the hosted chpldocs so I'll show it below in text...

class Collection {
  type eltType;

  /*
    Adds an element to this data structure.
  */
  proc add(elt : eltType) : bool;

  /*
    Removes an arbitrary element from this data structure.

    BUG: Compiler will segfault if the returned value is not captured at callsite.
    'c.remove();' == segfault
    BUG: Compiler with LICM will cause undefined behavior if the return value is captured
    in a variable declare out of a loop.
   'var captured : (bool, eltType); while !captures[1] do captured = remove()' == undefined behavior
  */
  proc remove() : (bool, eltType);

  // Checks to see if the data structure contains an item. Note we do not support the
  // 'removeItem' equivalent by default due to it causing a ton of headache in implementing
  // my ordered data structure.
  proc contains(elt : eltType) : bool;

  // By default calls repeatedly calls 'remove()' until empty if not overloaded. May be
  // problematic with issue #7003
  proc clear();

  // By default checks if 'size() == 0', can be overloaded
  inline proc isEmpty() : bool {
    halt("'proc isEmpty() : bool' is not supported...");
  }

  // Returns number of elements.
  proc size() : int;

  // By default, they must implement serial iteration; I would put Parallel iteration here, but it is
  // not possible as the leader iterator's yielded chunk can vary in type. Also its not the easiest
  // to debug either (I.E: #6998)
  iter these() : eltType;

  // Freezes the collection, rendering it immutable.
  proc freeze() : bool;

  // Unfreezes the collection, rendering it mutable.
  proc unfreeze() : bool;

  // If this collection can be frozen.
  proc canFreeze() : bool;

  // If this collection is currently frozen.
  proc isFrozen() : bool;
}

Note that there isn't any 'Stack', 'List', or 'Queue' derivative. If they are needed, they can be added later. I plan to submit two scalable data structures (used to be 5, but 3 of which I took the best of and merged into one, and another discarded entirely). Now, they will directly inherit from 'Collection', this should avoid most of other issues with dynamic dispatch as well and make my job a lot easier. I definitely promised more than I originally could deliver, apologies for this sudden change.

LouisJenkinsCS commented 7 years ago

Now that the PR has been posted, I have another revision I kind of want to make. That would be potentially removing 'freeze' logic from the API entirely. Turns out its a lot more tedious and harder to implement than originally expected. Right now, all data structures allow for iteration during concurrent access, so 'freeze' currently merely elides the need to acquire any locks. Worst of all, the DistributedBag sees a large 33% performance loss from the overhead of these checks, so perhaps I should remove it? Since I originally brought up the idea and no one voiced objections to it, I'd need to see if people would want it or not.

mppf commented 7 years ago

I don't have any objection to removing the "freeze" logic. Such logic complicates the API and is only really useful if it offers some performance advantage, in my opinion.

LouisJenkinsCS commented 7 years ago

Cool! I'll assume most other people would feel the same way. As well, since the 'resource leakage on breaking out of iterators' is a confirmed bug, I can probably make a warning that the user cannot break out of serial iterators early without consequence.