cjerdonek / quick-sampler

Browser implementation of Ronald L. Rivest's sampling algorithm using AngularJS
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Quick Sampler

Build Status

This repository contains the source code for an open-source web application called "Quick Sampler" for transparently and efficiently choosing items at random from a collection. Quick Sampler can be used for things like selecting precincts at random for a post-election manual audit.

The application does this by running the SHA-256 pseudo-random sampling algorithm described publicly by Ronald L. Rivest in 2011. You provide the number of items in the collection and a random seed, and the algorithm picks the items. The items picked depend only on this information.

The repository is hosted on GitHub here. To try the latest release, go here. For bug reports and feature requests, visit the issue tracker.

Application Features

Some application highlights are as follows:

Since the application uses only client-side Javascript (i.e. there is no server-side logic), the application can be deployed using only static HTML files, etc. This makes installing much easier if you would like to host the application on your own servers. Guidance for installing locally can be found in the Installing Locally section.

The tests cover both the algorithm itself and the user interface. The algorithm itself (i.e. the output of the algorithm independent of the user interface) is tested against the publicly available test cases in the rivest-sampler-tests repository. All tests are set up to run automatically using Travis CI. Different browser versions are tested automatically from Travis using Selenium on Sauce Labs.

About the Algorithm

The Rivest sampling algorithm provides an efficient way to choose items at random from a collection.

The user must supply a random seed. The random seed can be any string of characters (e.g. letters and numbers). For example, the random seed can be a string of digits generated by rolling dice several times.

Since the output of the algorithm depends only on the random seed, independent observers can check the result if they know the seed. Moreover, because the algorithm is described publicly and is relatively easy to implement, observers can check the result against other implementations of the algorithm or even their own.

The reference implementation of the algorithm (written in Python), along with a description by Rivest can be found here on Rivest's web site. Another browser implementation developed by Philip B. Stark can be found here.

Algorithm Description

This section describes the algorithm using an illustration.

Say you would like to choose from a collection of 1000 items given the random seed abc123. In actual practice, Rivest recommends using a random seed having at least 24 digits if the seed is a decimal number (i.e. a number using the digits 0 through 9).

To find the first number, join the seed with a comma and the number one to get the following string:

abc123,1

Encode this string into bytes using UTF-8 and then hash it with SHA-256 to get the following 64-digit hexadecimal integer:

e4d62e1e48f7a7aa0e59c528bddfd96b4ff06fbff7620d7ae77e0d7e044cbf2e

Compute this integer modulo 1000 to give you an integer between 0 and 999, which in this case is 454. Add one to get 455. This means that the 455th element in the collection of 1000 items is the first item.

To get the second item, apply the same process but with the number two in place of one:

abc123,2

In general, concatenate the random seed with the decimal representation of the sequence number of the item you would like to pick.

If you would like to pick items without repetition and you get a repeat of a previously picked item, apply the operation again until you get a new item (as many times as is necessary).

Installing Locally

If you would like to host Quick Sampler on your own servers, the project's Releases page contains pre-built releases. Each release is a self-contained gzip file containing the directory of files to host. The directory contains concatenated, minified Javascript files, all necessary third-party Javascript libraries, etc.

Known Issues

Currently, the application does not support random seed strings that contain non-BMP Unicode characters (i.e. characters whose Unicode code point is higher than 65,536). This is because of this issue in the jsSHA Javascript library that the application uses.

Development

See the Development page for information on developing locally, including how to build the project from source and for other information on the source code. This page also contains information for project maintainers.

License

This project is licensed under the BSD 3-clause license. See the LICENSE file for details.

Author

Chris Jerdonek (chris.jerdonek@gmail.com)

Acknowledgments

This project was inspired by the browser implementation here developed by Philip B. Stark.