chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.79k stars 421 forks source link

Build a library of iterator tools #12657

Open ben-albrecht opened 5 years ago

ben-albrecht commented 5 years ago

It would be useful to provide a toolkit of common serial and parallel iterators to users through a library. Python’s itertools library would make a good reference for target functionality as well as early performance comparisons. However, this task is not limited to iterators available in itertools.

Assuming we're porting python itertools, we would need to determine the following for each iterator:

There are a number of example recipes provided in the itertools documentation that demonstrate combining multiple iterators together to create powerful constructs. Being able to reproduce many (or all) of these constructs would be a good design goal.

ben-albrecht commented 5 years ago

This is up for grabs as a GSoC 2019 project. Some suggestions to GSoC applicants:

ben-albrecht commented 5 years ago

Though not explicitly listed, this may be considered part of https://github.com/chapel-lang/chapel/issues/6329

akshansh2000 commented 5 years ago

I want to take this up as a GSoC 2019 project. I was just wondering if beginning with implementing the repeat and product itertools of python would be a good way to start. As far as I know, these tools are not available in Chapel as of now. Please let me know if I missed something, and whether to go ahead with this.

For example, the basic implementation of product for integer ranges could be given as:

iter product(x: range, y: range) {
    for i in x do for j in y do yield (i, j);
}

writeln(product(1..5, 3..5));

// (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3, 3) (3, 4) (3, 5) (4, 3) (4, 4) (4, 5) (5, 3) (5, 4) (5, 5)

So what I need to do is write the tests and documentation, work on determining whether it can be made parallelizable or not, and make this process fast by reducing communication overheads, right?

Please let me know if I'm missing something. Thank You!

ben-albrecht commented 5 years ago

Hi @akshansh2000, repeat and product seem like good starts. I expect repeat to be relatively simple, but product will be a little more challenging.

Regarding your product implementation, there are a few design goals to consider:

Beyond that, documentation, evaluating performance, and determining if the implementation should/can be parallelized would be your next steps (in any order you prefer).

akshansh2000 commented 5 years ago

Thank you, @ben-albrecht. I'm working on it. I'll keep you updated.

akshansh2000 commented 5 years ago

The basic implementation for the repeat tool is given:

iter repeat (arg, times = 0) {
    if times == 0 then
        for count in 1.. do yield arg;
    else
        for count in 1..#times do yield arg;
}

// For integer/real/complex etc..

writeln(repeat(10, 15));

// For arrays/strings

writeln(repeat([1, 4, 6, 7], 4));
writeln(repeat("Testing", 5));

// For dictionaries

var dictDomain: domain(string);
var dict: [dictDomain] int;

dict["one"] = 1;
dict["two"] = 2;
dict["three"] = 3;

writeln(repeat((dictDomain, dict), 3));

// For infinite repetition

write(repeat("infinite"));

Output:

10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
1 4 6 7 1 4 6 7 1 4 6 7 1 4 6 7
Testing Testing Testing Testing Testing
({two, three, one}, 2 3 1) ({two, three, one}, 2 3 1) ({two, three, one}, 2 3 1)
infinite infinite infinite infinite .....

Please let me know what all I need to change in this. This sure could be parallelized (only in the case of a bounded range), and I'll work on that next. The documentation and performace evaluation is further in the queue.

LouisJenkinsCS commented 5 years ago

@akshansh2000 You may want to make this into its own Pull Request. (also side note: If you add write the code block as ```chpl you get syntax highlighting)

iter repeat (arg, times = 0) {
    if times == 0 then
        for count in 1.. do yield arg;
    else
        for count in 1..#times do yield arg;
}

You also will want a parallel iterator. I'll do this one for you so you get the gist of how to do so in Chapel...

iter repeat (arg, times = 0, param tag : iterKind) where tag == iterKind.standalone {
    if times == 0 then
        forall count in 1.. do yield arg;
    else
        forall count in 1..#times do yield arg;
}

You'll also definitely want a leader-follower version so you can do something like this...

forall (ix, a) in zip(repeat(fixedIdx), arr) {
   a = ix;
}

In the above example, you zip over an infinite stream repeat(fixedIdx) and some finite container.

akshansh2000 commented 5 years ago

Thanks for the clarification on the parallel iterator, @LouisJenkinsCS! I'll work on the leader-follower part and share my progress.

(Thanks for the syntax highlighting part as well hahaha)

Also, as for the PR, I'm supposed to make a Mason package and make a PR to the mason-registry repository, right?

LouisJenkinsCS commented 5 years ago

Hm... I'm not sure, honestly. Previously students just submit a PR to the repository and it gets stuck somewhere (likely labeled as some kind of prototype/test), but I suppose it makes sense if @ben-albrecht would want you to implement it as a Mason package.

akshansh2000 commented 5 years ago

Alright, I'll wait for @ben-albrecht sir to reply, and work on my code until then.

Thank you!

ben-albrecht commented 5 years ago

Mason package vs package module is still an open question at this point. Though, I'm leaning towards a package module right now so that we can utilize the nightly performance testing. Once mason packages have this functionality, we could migrate this to a mason package, like we plan to do for other existing package modules (https://github.com/chapel-lang/chapel/issues/10713).

akshansh2000 commented 5 years ago

The repeat iterator is almost complete, I believe. Please take a look at my implementation and let me know if I need to change something.

Also, I'm a bit confused here. How do I implement infinite repetition for a parallel loop? In the case of a forall loop, it needs to know to the total number of tasks to divide the repeat task into different tasks, while in the case of a coforall loop, the program terminates at runtime due to excessive memory usage. Please help me out with this, I'm a bit lost.

Here's the code; I've commented the non working code below.

// Serial Iterator

iter repeat (arg, times = 0) {
    if times == 0 then
        for count in 1.. do yield arg;
    else
        for count in 1..#times do yield arg;
}

// Standalone Parallel Iterator

iter repeat (param tag : iterKind, arg, times = 0) where tag == iterKind.standalone {
    if times == 0 then
        coforall count in 1.. do yield arg; // parallel infinite loop creates problem
    else
        coforall count in 1..#times do yield arg;
}

// Procedure to compute chunks which are to be iterated through

proc computeChunk(r : range, myChunk, numChunks) where r.stridable == false {
    const numElems = r.length;
    const elemsPerChunk = numElems / numChunks;
    const mylow = r.low + (elemsPerChunk * myChunk);

    if (myChunk != numChunks - 1) then
        return mylow..#elemsPerChunk;
    else
        return mylow..r.high;
}

const numTasks = here.maxTaskPar; // max number of tasks supported by locale

// Leader

iter repeat (param tag : iterKind, arg, times = 0) where tag == iterKind.leader {
    if times == 0 then
        coforall task_id in 0.. do yield(0..1,); // parallel infinite loop creates problem
    else
        coforall task_id in 0..#numTasks {
            const working_iters = computeChunk(0..#times, task_id, numTasks);
            yield(working_iters,);
        }
}

// Follower

iter repeat (param tag : iterKind, arg, times = 0, followThis)
        where tag == iterKind.follower && followThis.size == 1 {
    const working_iters = followThis(1);

    for idx in working_iters do yield arg;
}

for element in repeat(1, 10) do write(element, ' '); // works - 1 1 1 1 1 1 1 1 1 1
writeln();

forall element in repeat('abc', 4) do write(element, ' '); // works - abc abc abc abc
writeln();

forall element in repeat(123) do write(element, ' '); // doesn't work, parallel infinite loop
writeln();

var arr: [1..5] int;

forall (idx, element) in zip(repeat(3, 5), arr) do element = idx; // works
writeln(arr); // - 3 3 3 3 3

forall (idx, element) in zip(repeat(2), arr) do element = idx; // doesn't work, parallel infinite loop
writeln(arr);

Thank you!

ben-albrecht commented 5 years ago

Also, I'm a bit confused here. How do I implement infinite repetition for a parallel loop?

This is not supported today in Chapel. As a result, we will not be able to pursue parallel implementations of any infinite iterators for this project.

akshansh2000 commented 5 years ago

Alright, so should I put the rest of the code into a module and create a PR?

LouisJenkinsCS commented 5 years ago

Oh yeah, I forgot that you can't break from a forall loop. I'm wondering if its wise to return a 'killswitch' which the leader and follower will check to see if it should continue.

LouisJenkinsCS commented 5 years ago

Note that if I make the array the 'leader', it works fine. It has to do with the leader not knowing when to stop, which is definitely a problem. TIO

akshansh2000 commented 5 years ago

Note that if I make the array the 'leader', it works fine. It has to do with the leader not knowing when to stop, which is definitely a problem. TIO

Noted. So for now should I exclude the infinite-parallel iteration from the code and make a module out of the rest of my code, and throw an error when trying to use infinite iteration with a parallel loop?

Also, should I open an issue regarding the above?

LouisJenkinsCS commented 5 years ago

You could halt when you have an infinite parallel loop for now, sure. The lack of a break statement in forall is a well-known issue, so no issue needed (unless Ben thinks differently)

ben-albrecht commented 5 years ago

You could halt when you have an infinite parallel loop for now, sure.

That sounds good to me, or just leave the parallel infinite iterator implementations out for now.

The lack of a break statement in forall is a well-known issue, so no issue needed (unless Ben thinks differently)

Despite being well-known, I think it's useful to have an issue to reference / search for. I've created a simple one here: https://github.com/chapel-lang/chapel/issues/12700

akshansh2000 commented 5 years ago

@ben-albrecht, @LouisJenkinsCS, I completed the serial version of the cycle itertool, which returns separate elements from an iterable, eg:

writeln(cycle('ABC'));
writeln(cycle(1..4, 2));
>>  A B C A B C A B C ...
>>  1 2 3 4 1 2 3 4

I was working on the parallel version, but I am not sure if it makes sense to parallelize. I don't think that a shuffling of order would be desirable.

Is there some case that I'm missing where it might be used?

ben-albrecht commented 5 years ago

I don't think a standalone parallel iterator makes sense for cycle, but a leader/follower parallel iterator might be useful in cases where zipped order matters, e.g.

forall (dayOfMonth, dayOfWeek) in zip(1..30, cycle('MWTRFSU')) {
  April[dayOfMonth] = dayOfWeek;
}
akshansh2000 commented 5 years ago

@ben-albrecht, @LouisJenkinsCS, I am having some problem implementing the parallel version of the cycle tool.

The serial version is working fine. However, the parallel version throws a zippered iterations have non-equal lengths error when the length of the iterable is more than 1. Here is the code for the leader-follower iterator:

use RangeChunk;

iter cycle(param tag: iterKind, arg, times = 0) throws
    where tag == iterKind.leader {

  var numTasks = if dataParTasksPerLocale > 0 then dataParTasksPerLocale else here.maxTaskPar;

  if numTasks > times then numTasks = times;

  if __primitive("method call resolves", arg, "these") {  // to check if `arg` is iterable
    coforall tid in 0..#numTasks {
      const working_iters = chunk(0..#times, numTasks, tid);
      yield(working_iters,);
    }
  } else
    throw new owned IllegalArgumentError(
        "non-iterable type argument passed");

}

iter cycle(param tag: iterKind, arg, times = 0, followThis)
    where tag == iterKind.follower && followThis.size == 1 {

  const working_iters = followThis(1);

  for working_iters do
    for element in arg do
      yield element;

}

On running the following code,

forall (el, id) in zip(cycle('ABCD', 2), 1..#8) do writeln(el, ' ', id);

the error for unequal lengths as described above is thrown. However, on running

forall (el, id) in zip(cycle(['ABCD'], 8), 1..#8) do writeln(el, ' ', id);

the program works fine.

What could I be missing? Thanks.

LouisJenkinsCS commented 5 years ago

@akshansh2000

['ABCD'] creates an array literal (I.E if you do writeln(arg.type : string) you'll see this fact), while 'ABCD' creates a string. When you do ['ABCD', 'ABCD', 'ABCD'] you get the same error; the issue has to do with the fact that you are yielding more than requested.

cycle('ABCD', 2) should print out 'AB', while ['ABCD'] should print out 'ABCD', 'ABCD'. To do this, you'll want to ensure that you short-circuit the computations. I.E insert a break inside of that follower sequential for loop when you hit the upper limit early.

akshansh2000 commented 5 years ago

@LouisJenkinsCS

Actually, cycle('ABCD', 2) should print A B C D A B C D, and that's why in the first example I set the times argument to 2, and let id iterate from 1..8, due to the string length being 4.

One cycle iteration means going through the whole iterable exactly once. Sorry for the missing explanation.

Similarly, in the second example, as the iterable length is 1, repeating it 8 times would require id to range from 1..8

LouisJenkinsCS commented 5 years ago

TIO

use RangeChunk;

iter cycle(arg, times = 0, param tag: iterKind) throws
    where tag == iterKind.leader {
  writeln(arg.type : string);
  var numTasks = if dataParTasksPerLocale > 0 then dataParTasksPerLocale else here.maxTaskPar;

  if numTasks > times then numTasks = times;

  if __primitive("method call resolves", arg, "these") {  // to check if `arg` is iterable
    coforall tid in 0..#numTasks {
      const working_iters = chunk(0..#times, numTasks, tid);
      yield(working_iters,);
    }
  } else
    throw new owned IllegalArgumentError(
        "non-iterable type argument passed");

}

iter cycle(arg, times = 0, followThis, param tag: iterKind)
    where tag == iterKind.follower && followThis.size == 1 {

  const working_iters = followThis(1);
  writeln(working_iters);
  var upperBound = working_iters.size;
  var _times = 0;
  label outerLoop
  while true do
    for element in arg {
      _times += 1;
      yield element;
      if _times == upperBound then break outerLoop;
    }

}

iter cycle(arg, times = 0) {}

forall (el, id) in zip(cycle('ABCD', 2), 1..#8) do writeln(el, ' ', id);

If you want it to go around in a cycle, it'd have to be a bit more complicated than what we have right now. If this is what you want you should obtain the size of the arg (assert that it is resolvable just like how you did for these), and then have some additional logic to skip the first N iterations over arg. RIght now it prints 'A' 'A' because both begin at the same letter. Then again, you could make this more efficient by only creating tasks if the number of times exceeds arg.size.

LouisJenkinsCS commented 5 years ago

I think I misunderstood the intention here. A string by itself is iterable and yields characters; if you want the string itself to be iterated over, the array literal is the best approach.

akshansh2000 commented 5 years ago

@LouisJenkinsCS, does that mean storing the string as an array of characters?

LouisJenkinsCS commented 5 years ago

TIO That should work

akshansh2000 commented 5 years ago

That is a great approach, @LouisJenkinsCS.

I'll try to optimize it and see if we can get it done using even fewer resources. I'll get back here if I have any doubts. Thank you!

ben-albrecht commented 5 years ago

TIO That should work

Nice example @LouisJenkinsCS.

I have a small suggestion - Reflection.canResolveMethod would be preferred over the __primitive call, which is not a public interface (and is therefore subject to change in the future without proper deprecation).

e.g.


use Reflection only;

...

if Reflection.canResolveMethod(arg, "these") {
  ...
} else {
   throw new owned IllegalArgumentError(...);
}
akshansh2000 commented 5 years ago

@LouisJenkinsCS @ben-albrecht

TIO That should work

This code doesn't work if the times argument is more than 2. TIO1, TIO2

I was trying to debug it. I took another example of a simple iterator that yields 1 eight times. The serial part of it (TIO3) is given as

iter abc() {
  for 1..8 do
    for j in 1..1 do
      yield j;
}

I then switched to a leader-follower one, but that didn't work quite well (TIO4). I set the default value of numTasks to 4 so that the range gets divided evenly.

This leader-follower one should yield 1 eight times, but it fails and gives the error as mentioned in TIO4. I can't understand why this happened. Also, to check if the range was getting divided evenly and the number of yielded 1s were eight, I wrote the following code (without an upper bound in the follower of the forall loop): TIO5.

And I guess the initial error which I got a few days back (zippered iterations have non-equal lengths), TIO6, was related to this as well?

Am I doing something wrong here?

bradcray commented 5 years ago

Most of our standard leader-follower iterators yield a tuple of indices using 0-based indexing. This is simply a convention so that if you're zippering something that's 0..7 with something that's 1..8 with something that's 1..16 by 2, they all have a common basis for what global iteration you're on. This means that an 8-element range's follower is expecting to receive indices in the 0..7 range. I expect that things are going wrong because your leader is yielding indices in the 1..8 range (but am surprised that the range iterator doesn't complain more about being given an iteration that's OOB... that seems like a bug). Specifically, if I make your leader return working_iters-1, it seems to work as expected: TIO

akshansh2000 commented 5 years ago

@bradcray, yes, I should've taken care of that. I understand now. Thank you for the explanation!

One more thing, though. Why does an unbounded range starting from 1 not cause an error? In this TIO, the yielded 1s are eight (which shouldn't be as per my understanding), and the indexing also starts from 2. Why does this happen?

bradcray commented 5 years ago

It would've been much easier to take care of if the range follower had complained at you that you were requesting unreasonable indices... It'd be good for one of us to look into why that was.

bradcray commented 5 years ago

It'd be good for one of us to look into why that was.

I just did this and only now realized that it was complaining at you, athough the error message could've been clearer:

error: halt reached - invoking orderToIndex on an integer 8 that is larger than the range's number of indices 8

I missed it the first time because I forgot that TIO separates stdout from stderr. Running it from my console made it much clearer...

akshansh2000 commented 5 years ago

@bradcray, yes, it is warning me in the case of a BoundedRange, as in TIO4, however, I still do not get any error in the case of an UnboundedRange, as in TIO5 (not even on my console).

bradcray commented 5 years ago

I think that's as expected. Unbounded ranges are treated a bit specially in that they are permitted to be zippered with anything without running into a length check validation failure. So since TIO5 wasn't 0-basing its indices, it yielded its second through #8'th element (so 2..9).

[edit: There's a long-term intention to give users the ability to write their own unbounded iterators that can similarly conform to the leader's size, but we haven't ever completed the design and implementation of that feature].

akshansh2000 commented 5 years ago

I'd love to help with it in any way possible!

bradcray commented 5 years ago

@akshansh2000: I wish I knew how to advise you to do so. It's a pretty big design challenge and likely to end up with a very different world than the one we have today. The general ideas we've had (that I'm aware of) are based on increasing the amount of interaction between a leader iterator and the follower iterators such that it's up to a follower to say "mismatched zipper iteration" or just let the follower consume as much of the leader as it wants. The thinking is that this interaction would also help fix a longstanding bug in which expressions like forall (i,j) in zip(1..3, 1..4) don't currently result in size mismatch errors. (I took a stab at this latter bug in https://github.com/chapel-lang/chapel/issues/11685 and there may be some other helpful notes there as well... or not). Again: big design challenge.

akshansh2000 commented 5 years ago

@bradcray, I guess I'd have to dig deep into the core concepts of Chapel for that, on it!

ben-albrecht commented 4 years ago

For anyone curious, the current progress of the itertools library can be found here: https://github.com/chapel-lang/chapel/tree/master/test/library/packages/Itertools

Much work remains to be done.