chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.79k stars 420 forks source link

Parallel random shuffle #14388

Open ben-albrecht opened 4 years ago

ben-albrecht commented 4 years ago

Implement a parallel random shuffle for Random.shuffle().

MergeShuffle seems like a good algorithm to target.

A reference MPI+python implementation is available in this SO answer.

```python from __future__ import print_function import sys import random from mpi4py import MPI comm = MPI.COMM_WORLD def exchange(localdata, sendrank, recvrank): """ Perform a merge-exchange with a neighbour; sendrank sends local data to recvrank, which merge-sorts it, and then sends lower data back to the lower-ranked process and keeps upper data """ rank = comm.Get_rank() assert rank == sendrank or rank == recvrank assert sendrank < recvrank if rank == sendrank: comm.send(localdata, dest=recvrank) newdata = comm.recv(source=recvrank) else: bothdata = list(localdata) otherdata = comm.recv(source=sendrank) bothdata = bothdata + otherdata bothdata.sort() comm.send(bothdata[:len(otherdata)], dest=sendrank) newdata = bothdata[len(otherdata):] return newdata def print_by_rank(data, rank, nprocs): """ crudely attempt to print data coherently """ for proc in range(nprocs): if proc == rank: print(str(rank)+": "+str(data)) comm.barrier() return def odd_even_sort(data): rank = comm.Get_rank() nprocs = comm.Get_size() data.sort() for step in range(1, nprocs+1): if ((rank + step) % 2) == 0: if rank < nprocs - 1: data = exchange(data, rank, rank+1) elif rank > 0: data = exchange(data, rank-1, rank) return data def main(): # everyone get their data rank = comm.Get_rank() nprocs = comm.Get_size() n_per_proc = 5 data = list(range(n_per_proc*rank, n_per_proc*(rank+1))) if rank == 0: print("Original:") print_by_rank(data, rank, nprocs) # tag your data with random values data = [(random.random(), item) for item in data] # now sort it by these random tags data = odd_even_sort(data) if rank == 0: print("Shuffled:") print_by_rank([x for _, x in data], rank, nprocs) return 0 if __name__ == "__main__": sys.exit(main())
ben-albrecht commented 4 years ago

Originally requested by @ronawho.

Aniket21mathur commented 4 years ago

@ben-albrecht how we use Random.shuffle currently?

I tried something like

use Random;

  var A = ['c', 'h', 'a', 'p', 'e', 'l'];

  shuffle(A, seed=10);

  writeln(A);

but got this error

$CHPL_HOME/modules/packages/Sort.chpl:1415: warning: Module level symbol is hiding function argument 'A'
$CHPL_HOME/modules/packages/Sort.chpl:1415: warning: Module level symbol is hiding function argument 'A'
$CHPL_HOME/modules/packages/Sort.chpl:1415: warning: Module level symbol is hiding function argument 'A'
$CHPL_HOME/modules/packages/Sort.chpl:1415: warning: Module level symbol is hiding function argument 'A'
$CHPL_HOME/modules/packages/Sort.chpl:1415: warning: Module level symbol is hiding function argument 'A'
Random.chpl:5: error: unresolved call 'shuffle([domain(1,int(64),false)] string, seed=10)'
Random.chpl:5: note: because no functions named shuffle found in scope

Please help. Thanks!

ben-albrecht commented 4 years ago

@Aniket21mathur - what chapel version are you using? Your example compiles and runs for me on the master branch.

Aniket21mathur commented 4 years ago

@ben-albrecht I am also working with the master branch, I build the sources again from master, but still getting the same error.

ben-albrecht commented 4 years ago

Which commit on master?

I am using: chpl version 1.21.0 pre-release (6051bbd358)

Is the above snippet the entire program you are compiling? Have you made any modifications to the standard modules anywhere?

You could try doing a fresh checkout and retrying.

Aniket21mathur commented 4 years ago

@ben-albrecht

Which commit on master?

I am working @ https://github.com/chapel-lang/chapel/commit/2c9530a7a6bf0b67c28de2d5155749b497450374

Is the above snippet the entire program you are compiling?

Yes, this is the entire program.

Have you made any modifications to the standard modules anywhere? Does that matter if I do a fresh checkout and make again?

thanks!

Aniket21mathur commented 4 years ago

@ben-albrecht seems like I figured it out, I was working in a directory in which I put all my test examples. In that directory, I made a file named Random.chpl , which had some code. So on running on above code, it seems to import Random from the present directory which is the Random.chpl file I made, deleting that file fixed the problem. :smile:

Aniket21mathur commented 4 years ago

@ben-albrecht I would like to work on this. Thanks!

ben-albrecht commented 3 years ago

15140 has proved that this issue is not in fact easy and straightforward, so I'm removing the label.

Yudhishthira1406 commented 3 years ago

Hi @ben-albrecht I have understood the implementation of Merge Shuffle. I would like to contribute to this. Thanks!

ben-albrecht commented 3 years ago

@Yudhishthira1406 - OK, check out #15140 for prior work. The blocker in that PR was demonstrating a performance improvement (in a new performance test) with the random implementation for large arrays over the serial implementation.

mppf commented 2 years ago

PR #17066 is @Yudhishthira1406 effort and that implements a parallel merge shuffle. But there are a few minor issues preventing it from being merged & we have closed it for the moment. If @Yudhishthira1406 or anybody else wants to complete this work, starting from PR #17066 and addressing last comment there would be the next step.