This project calculates the generalized birthday problem in the Python language using the Decimal
class. It can be used from the command line as a standalone application or as a dependency from another project.
Treats the generalized birthday problem for arbitrary values.
Calculates the generalized birthday problem, the probability P
that, when sampling uniformly at random N
times (with replacement) from a set of D
unique items, there is a non-unique
item among the N
samples. In the original birthday problem formulation, N
is 23 and D
is 366 (or 365) for a risk of P
≈ 0.5 = 50% of at least two people having the same birthday.
In mathematical terms, this can be expressed as
since the probability of picking all of N
unique is equal to the number of ways to pick N
unique samples divided by number of ways to pick any N
samples. This, of course, given
the assumption that all D
items are equally probable.
The project supports calculating both the probability P
from N
and D
(using exact method, exact method with Stirling's approximation in the calculation of faculties or Taylor approximation) and
N
from D
and P
(using exact method or Taylor approximation). Both approximations get asymptotically close to the exact result as D
grows towards infinity. The exact method should not be used for larger
numbers. For extremely small probabilities P
, the exact method with Stirling's approximation used for faculties may become unstable as it involves many more different operations than
the Taylor approximation which, each, results in small round-offs. Another source of error in this case arises from the use of Stirling's formula for two calculations of faculties (D!
and (D - N)!
). Since one of these ((D - N)!
) diverges slightly more from the exact result than the other (D!
), the difference between these (used for calculations in log space) might
introduce small errors when P
is extremely small. A good check to see whether the approximation in question is suffering or not is to compare it to the Taylor approximation and see
whether they match well.
Name | Type | Effect | CLI flag |
---|---|---|---|
D |
integer | The size of the set to sample from | - |
N |
integer | The number of samples sampled from D |
-n |
P |
floating point number | The probability of a non-unique sample in N |
-p |
binary |
boolean | Whether to interpret D and N as base-2 logarithms |
-b |
combinations |
boolean | Whether to interpret D as the size of a set from which we must yield the actual size, D! , of the set to sample from |
-c |
taylor |
boolean | Whether to calculate P or N with Taylor approximation |
-t |
stirling |
boolean | Whether to calculate P with exact method using Stirling's approximation in calculation of faculties |
-s |
exact |
boolean | Whether to calculate P or N with exact method |
-e |
all |
boolean | Whether to calculate P or N with all methods (implies -s -t -e for P and -t -e for N ) |
-a |
json |
boolean | Whether to output answer as a Json object or as text | -j |
prec |
integer | Decimals in the solution where applicable (in [0, 10] with default 10) | --prec |
This project uses Python 3.7 or higher.
Command-line usage can be summarized with following input parameters:
D [-n SAMPLES] [-p PROBABILITY] [-b] [-c] [-t] [-s] [-e] [-a] [-j] [--prec PREC]
The meaning of the flags can be found in the table further up. An additional --help
flag is also available to provide information.
Example usage of standalone application on command line:
Calculate the probability P
of at least one non-unique birthday among N
= 23 persons with all available methods:
> python BirthdayProblem.py 366 -n 23 -a
Calculate, approximatively, the number of times N
a deck of cards has to be shuffled to have a P
= 50% probability of seeing a repeated shuffle:
> python BirthdayProblem.py 52 -p 0.5 -c -t
Calculate, with approximative methods, the probability P
of a collision in a 128-bit crypto when encrypting N
= 2^64 = 18 446 744 073 709 551 616 blocks with the same key and output answer as a Json object with at most 5 decimals:
> python BirthdayProblem.py 128 -n 64 -b -s -t -j --prec 5
Use the following command on the standalone application to get information about usage:
> python BirthdayProblem.py --help
The following shows example usage of this project in another application:
from BirthdayProblem import BirthdayProblem
from decimal import *
# Program.py
Solver = BirthdayProblem.Solver
CalcPrecision = Solver.CalcPrecision
[p, pMethod] = Solver.solveForP(Decimal('366'), Decimal('23'), False, False, CalcPrecision.EXACT)
[n, nMethod] = Solver.solveForN(Decimal('52'), Decimal('0.5'), False, True, CalcPrecision.TAYLOR)
The functions to call have signatures
def solveForP(dOrDLog, nOrNLog, isBinary, isCombinations, method)
def solveForN(dOrDLog, p, isBinary, isCombinations, method)
and may throw exceptions.
To run tests, simply execute
> python RunTests.py
Elias Lousseief (2020)
v. 1.0
v. 1.1
N
.N
.--prec
for command-line interface allowing the user to choose output precision, where applicable, in [0, 10] with default 10.v. 1.2
m
take n
.D
.v. 1.3
_BirthdayProblemNumberFormatter.toLog10ReprOrNone
._BirthdayProblemSolverChecked.birthdayProblemInv
._BirthdayProblemTextFormatter.methodToText
._BirthdayProblemTextFormatter.methodToText
and _BirthdayProblemTextFormatter.methodToDescription
.v. 1.4
N
given D
and P
using a numerical approach, this means that from now on
multiple solution strategies can be used for this calculation as well (earlier this calculation always used Taylor
approximation).facultyLog
for when input is 0. Since 0! = 1 and the return value is in log space, the
correct answer is 0 and not 1.N
using D
and P
when D
is 1 and P
is neither 0 nor 1 (in this case
the answer is always 2).v. 1.4.1
v. 1.4.2
N
with overflow in D
.D
parameter when -c
flag is used. Earlier this relied on the Sterling
approximation. It still relies on Sterling approximation when -b
and -c
are used in combination with an input
D
larger than 15.