SorterHunter

An evolutionary approach to find small and low latency sorting networks

About

SorterHunter is a C++ program used to find well performing sorting networks. For a number of input sizes, the method used by the program succeeded in reducing the upper bound of S(n) compared to previous work. (Definition: see "The Art of Computer Programming Vol. 3 - Sorting and Searching" - D.E. Knuth, 1998)

n	S(n) previous upper bound	S(n) new upper bound
18	78 (*)	77
19	86 (*)	85
20	92 (*)	91
21	102 (*)	99
22	108 (*)	106
23	118 (*)	114
24	123 (*)	120
25	133 (*)	130
26	140 (*)	138
27	150 (*)	147
28	156 (*)	155
29	165 ($)	164
33	201 ($)	199
34	211 ($)	209
35	223 ($)	220
36	231 ($)	227
37	244 ($)	240
38	253 ($)	250
39	263 (^)	259
40	269 (^)	265
41	284 (^)	282
42	293 (^)	291
43	305 (^)	303
44	313 (^)	309
45	326 (^)	324
46	334 (^)	332
47	343 (^)	340
48	349 (^)	346
54	422 (^)	421
55	433 (^)	432
56	439 (^)	438

(*) Ref: "Using Symmetry and Evolutionary Search to Minimize Sorting Networks", Valsalam&Miikkulainen, 2013.
($) Compared to smallest Batcher odd-even merge network
(^) Compared to Van Voorhis (4,4) merge strategy

The results above were obtained while keeping also the number of parallel operation steps - a.k.a. "depth" - low.

For 25 and 26 inputs, the program was able to reduce the optimal depth upper bound from 14 to 13 layers. A similar reduction to 15 layers was achieved for 34 inputs, and 17 layers for 42, 43 and 44 inputs.

I committed the program to the public domain as inspiration source for further improvements on the subject of sorting networks.

The following lists of networks have been compiled for your convenience:

the list of best performing sorting networks keeps track of the smallest and fastest networks with up to 32 inputs (as far as known by the author), together with the known bounds for depth and size.
the extended list of sorting networks is a similar, larger table for up to 64 inputs.
sorting networks of smallest known size optimized for average swap count that may give a performance boost on some target architectures.
sorting networks of smallest known size optimized for worst case swap count regardless of the input.
a list of median selection networks that were optimized using SorterHunter.

The program

The program is very straightforward to build (just "make") on a Linux machine. It expects one command line argument, which is the name of the configuration file to use. An example config file is bundled with the sources. Once initialised the program will enter an endless optimisation loop, printing out any improvements it found to previous results it reported. Current version is limited to 64 inputs.

Working principles

After the config file is read, the program works as follows:

Determining the prefix to use

During the entire run time of the (current) program, the same prefix is used. There are three options:

Empty prefix - all CEs (compare-exchange nodes) participate in the evolutionary optimisation
Fixed prefix - provided by the user in the config file
Greedy prefix - in a simple algorithm I call "greedy algorithm A", nodes are added one by one starting from an empty network and up to the specified prefix size is reached. For each step, a CE is randomly chosen among those that result in the minimum number of remaining (partially sorted) patterns after the prefix network (assuming application of the "zero-one principle"). I expect that still much can be improved on the prefix selection algorithm.
Preparing the test vectors

With an empty prefix, testing a network with N inputs by blind application of the "zero-one principle" would require 2^N input vectors of N elements to be sent through the candidate sorter. While CE nodes are added in the prefix, the set of possible output patterns of the prefix gradually decreases. To test the remainder of the network, only the possible output patterns of the prefix network need to be considered as input vectors. As the prefix is known before we will iteratively try to improve the network, the test vectors can be enumerated in a fixed list. Test vectors are placed in pseudorandom order. The reason for this is that in this way we increase the chances that an invalid candidate sorter (the vast majority!) will be rejected early in the test process, assuming not all test vectors are applied at once. Further, using 0's and 1's as input, the behaviour of a CE can simply be modelled as a combination of a single "and" and "or" gate. As the order of comparisons and exchanges for a sorting network is fixed, we use a bit-parallel approach to sort multiple test vectors in parallel (64 on a 64 bit machine), obtaining a considerable speed increase.

Determining an initial candidate sorter

Perhaps the weakest part of the algorithm today: the initial candidate is obtained by randomly adding CEs to the network until a valid sorter is obtained. Although there are many obvious ways to create a better initial network, it's hard to guarantee that a "good" initial network won't bias the solutions obtained through evolution. Work to do.

The never ending loop (that is: until Ctrl+C)

If you support the evolution theory, you also agree that it is a slow process. This program produces output in far less than a billion years, but it could easily run for that long. The main iteration loop will take a copy of the latest accepted network and apply one or more mutations to it. Six mutations are currently part of the program, and their relative probabilities can be tuned via the config file. The "regular" mutations are:
Removal of random CE from the variable (i.e. "postfix") list: "downhill" step
Swap two CEs at random positions in the list
Replace a CE at a random position with another random CE
Cross the I/O lines of two CEs at random positions in list
Swap neighbouring CEs that share a connection
Change 1 connection of a random CE

The mutations are accepted on condition that the mutant sorting network is valid. This is determined by applying the test vectors mentioned before. An infrequent modification is the insertion of a duplicate or random CE at a random position in the list. The "uphill" change allows the algorithm to escape from local minima.

Reporting results

Users of sorting networks are interested in low size, low depth but maybe also in trade-offs between them. To satisfy all, the program uses an orthogonal convex hull approach that keeps track of all (size,depth) pairs that are not outperformed by other networks.

bertdobbelaere / SorterHunter

readme

SorterHunter

About

The program

Working principles

Determining the prefix to use

Preparing the test vectors

Determining an initial candidate sorter

The never ending loop (that is: until Ctrl+C)

Reporting results