fast-reflexes / BirthdayProblem-Python

Implementation of a solver of the generalized birthday problem in Python
MIT License
8 stars 0 forks source link
birthday-paradox birthday-problem python stirling-approximation taylor-approximation

Birthday problem solver in Python

This project calculates the generalized birthday problem in the Python language using the Decimal class. It can be used from the command line as a standalone application or as a dependency from another project.

Synopsis

Treats the generalized birthday problem for arbitrary values.

Calculates the generalized birthday problem, the probability P that, when sampling uniformly at random N times (with replacement) from a set of D unique items, there is a non-unique item among the N samples. In the original birthday problem formulation, N is 23 and D is 366 (or 365) for a risk of P ≈ 0.5 = 50% of at least two people having the same birthday.

In mathematical terms, this can be expressed as

Formula

since the probability of picking all of N unique is equal to the number of ways to pick N unique samples divided by number of ways to pick any N samples. This, of course, given the assumption that all D items are equally probable.

The project supports calculating both the probability P from N and D (using exact method, exact method with Stirling's approximation in the calculation of faculties or Taylor approximation) and N from D and P (using exact method or Taylor approximation). Both approximations get asymptotically close to the exact result as D grows towards infinity. The exact method should not be used for larger numbers. For extremely small probabilities P, the exact method with Stirling's approximation used for faculties may become unstable as it involves many more different operations than the Taylor approximation which, each, results in small round-offs. Another source of error in this case arises from the use of Stirling's formula for two calculations of faculties (D! and (D - N)!). Since one of these ((D - N)!) diverges slightly more from the exact result than the other (D!), the difference between these (used for calculations in log space) might introduce small errors when P is extremely small. A good check to see whether the approximation in question is suffering or not is to compare it to the Taylor approximation and see whether they match well.

Parameter legend

Name Type Effect CLI flag
D integer The size of the set to sample from -
N integer The number of samples sampled from D -n
P floating point number The probability of a non-unique sample in N -p
binary boolean Whether to interpret D and N as base-2 logarithms -b
combinations boolean Whether to interpret D as the size of a set from which we must yield the actual size, D!, of the set to sample from -c
taylor boolean Whether to calculate P or N with Taylor approximation -t
stirling boolean Whether to calculate P with exact method using Stirling's approximation in calculation of faculties -s
exact boolean Whether to calculate P or N with exact method -e
all boolean Whether to calculate P or N with all methods (implies -s -t -e for P and -t -e for N) -a
json boolean Whether to output answer as a Json object or as text -j
prec integer Decimals in the solution where applicable (in [0, 10] with default 10) --prec

Versions

This project uses Python 3.7 or higher.

Usage

Command-line

Command-line usage can be summarized with following input parameters:

D [-n SAMPLES] [-p PROBABILITY] [-b] [-c] [-t] [-s] [-e] [-a] [-j] [--prec PREC]

The meaning of the flags can be found in the table further up. An additional --help flag is also available to provide information.

Examples

Example usage of standalone application on command line:

Example 1

Calculate the probability P of at least one non-unique birthday among N= 23 persons with all available methods:

> python BirthdayProblem.py 366 -n 23 -a

Example 2:

Calculate, approximatively, the number of times N a deck of cards has to be shuffled to have a P = 50% probability of seeing a repeated shuffle:

> python BirthdayProblem.py 52 -p 0.5 -c -t

Example 3:

Calculate, with approximative methods, the probability P of a collision in a 128-bit crypto when encrypting N = 2^64 = 18 446 744 073 709 551 616 blocks with the same key and output answer as a Json object with at most 5 decimals:

> python BirthdayProblem.py 128 -n 64 -b -s -t -j --prec 5

Help:

Use the following command on the standalone application to get information about usage:

> python BirthdayProblem.py --help

As a dependency

The following shows example usage of this project in another application:

from BirthdayProblem import BirthdayProblem
from decimal import *

# Program.py

Solver = BirthdayProblem.Solver
CalcPrecision = Solver.CalcPrecision

[p, pMethod] = Solver.solveForP(Decimal('366'), Decimal('23'), False, False, CalcPrecision.EXACT)
[n, nMethod] = Solver.solveForN(Decimal('52'), Decimal('0.5'), False, True, CalcPrecision.TAYLOR)

The functions to call have signatures

def solveForP(dOrDLog, nOrNLog, isBinary, isCombinations, method)
def solveForN(dOrDLog, p, isBinary, isCombinations, method)

and may throw exceptions.

Testing

To run tests, simply execute

> python RunTests.py

Notes

Author

Elias Lousseief (2020)

Change log