Open delphidabbler opened 1 year ago
Median: middle value after sorting - if even number of elements then take mean of middle two => Median of integer array has to return float since mean of two integers could be fractional.
Mode: Result can be nothing if the is no value that has a greater number of instances then anything else; single value or multiple values if there is more than one value with same number of instances. Therefore Mode must return an array. Note that Mode can apply to any type, not just numbers.
Note that Delphi's Math
unit has a Mean
function, but only for Float arrays:
function Mean(const Data: array of Single): Single;
function Mean(const Data: array of Double): Double;
function Mean(const Data: array of Extended): Extended;
So we could define Mean for integers that returns a Double
. Two possible implementations:
function MeanInt(const A: array of Integer): Double;
begin
Result := 0.0;
for var I := Low(A) to High(A) do
Result := Result + A[I] / Length(A);
end;
function MeanInt2(const A: array of Integer): Double;
begin
var Sum: Int64 := 0;
for var Elem in A do
Inc(Sum, Elem);
Result := Sum / Length(A);
end;
Also, there is geometric mean:
The geometric mean, sometimes also called geometric average, is an average calculated by multiplying a set of positive values and taking the nth root, where n is the number of values.
The geometric mean is used to minimize the effects of extreme values; for instance, when averaging growth rates.
--- Source: Eurostat
Geometric mean only works if all numbers are > 0.
Two formula (source Wikipedia):
Note the case 1 will overflow easily, so case 2 seems best algorithm.
E.g.:
function GeoMean(const A: array of Double): Double;
begin
Assert(Length(A) > 0);
var SumOfLogs: Double := 0.0;
for var D in A do
begin
if Sign(D) <> PositiveValue then
raise EArgumentOutOfRangeException.Create('Non positive value passed to GeoMean');
SumOfLogs := SumOfLogs + Ln(A);
end;
Result := Exp(SumOfLogs / Length(A));
end;
And there's the logarithmic mean of two +ve numbers (not same as log-mean), which is defined as:
Where x = y:
Mlm(x, y) = x
Where x ≠ y:
Mlm(x, y) = (y - x) / (ln(y) - ln(x))
So:
function LogarithmicMean(const X, Y: Double): Double;
begin
Assert((X > 0) and (Y > 0));
if SameValue(X, Y) then
Result := X
else
Result := (Y - X) / (Ln(Y) - Ln(X));
end;
Generalised Mean / Power Mean:
A power mean is a mean of the form
where all x are >= 0.
Source: Statistics How To & Wolfram MathWorld
function PowerMean(const A: array of Extended; Lambda: Extended): Extended;
begin
Assert(not IsZero(Lambda));
Assert(Length(A) > 0);
var Sum: Extended := 0.0;
for var X in A do
begin
Assert(Sign(X) <> NegativeValue);
Sum := Sum + Power(X, Lambda);
end;
Result := Power(Sum / Length(A), 1 / Lambda);
end;
Weighted Generalised Mean / Power Mean
When all weights w sum to 1 then denominator is 1 and can be ignored.
Soure: Statistics How To and Wikipedia
Weighted geometric mean
Source: Wikipedia
Weighted arithmetic mean
Source: Wikipedia
Useful helper functions for checking arrays for zero & negative entries:
// Check if all elements of a non-empty array are zero
function ArrayIsZero(const A: array of Extended): Boolean;
begin
Assert(Length(A) > 0);
Result := False;
for var Elem in A do
if not IsZero(Elem) then
Exit;
Result := True;
end;
// Check if all elements of a non-empty array are <> 0
function ArrayIsNonZero(const A: array of Extended): Boolean;
begin
Assert(Length(A) > 0);
Result := False;
for var Elem in A do
if IsZero(Elem) then
Exit;
Result := True;
end;
// Check if all elements of a non-empty array are > 0
function ArrayIsPositive(const A: array of Extended): Boolean;
begin
Assert(Length(A) > 0);
Result := False;
for var Elem in A do
if Sign(Elem) <> PositiveValue then
Exit;
Result := True;
end;
// Check if all elements of a non-empty array are < 0
function ArrayIsNegative(const A: array of Extended): Boolean;
begin
Assert(Length(A) > 0);
Result := False;
for var Elem in A do
if Sign(Elem) <> PositiveValue then
Exit;
Result := True;
end;
// Check if all elements of a non-empty array are <= 0
function ArrayIsNonPositive(const A: array of Extended): Boolean;
begin
Assert(Length(A) > 0);
Result := False;
for var Elem in A do
if Sign(Elem) <> PositiveValue then
Exit;
Result := True;
end;
// Check if all elements of a non-empty array are >= 0
function ArrayIsNonNegative(const A: array of Extended): Boolean;
begin
Assert(Length(A) > 0);
Result := False;
for var Elem in A do
if Sign(Elem) = NegativeValue then
Exit;
Result := True;
end;
A related (helper) function is LogSumExp or LSE for a sequence x1,...,xn is is defined as the logarithm of the sum of the exponentials of the arguments:
LSE(x1,...,xn) = log(exp(x1) + ... + exp(xn))
Source: Wikipedia
function LSE(const A: array of Extended): Extended;
begin
var Sum: Extended := 0.0;
for var X in A do
Sum := Sum + Exp(X):
Result := Ln(Sum);
end;
Strictly convex LogSumExp
Source: Wikipedia
Since exp(0) = e^0 = 1, we have
LSE(0, x1,...,xn) = log(exp(0) + exp(x1) + ... + exp(xn))
= log(1 + exp(x1) + ... + exp(xn)),
So, in Pascal we can have:
function LSEPlus0(const A: array of Extended): Extended;
begin
var Sum: Extended := 1.0; // for additional Exp(0) term
for var X in A do
Sum := Sum + Exp(X):
Result := Ln(Sum);
end;
Quasi-arithmetic mean
Source: Wikipedia
Note that certain functions f may be restricted to certain intervals of the real numbers, so if would be wise to pass a closure to the Pascal function that ensures that the passed array has elements that fall within the interval, e.g.
function QAMean(
const A: array of Extended;
F, FInv: TFunc<Extended,Extended>;
Constraint: TPredicate<Extended>
): Extended;
We could let Constraint
be optional and default to nil
in which case no checking would be done (I.e. any real number would be valid).
Harmonic mean
Source: Wikipedia
Median
Two overloads suggested, one real & one integer:
function Median(const A: array of Extended): Extended; overload;
begin
Assert(Length(A) > 0);
// optimisations for array lengths 1 & 2 to avoid sorting
if Length(A) = 1 then
Exit(A[0]);
if Length(A) = 2 then
Exit((A[0] + A[1]) / 2.0);
Generics.Collections.TArray.Sort<Extended>(A); // using standard comparer
var MiddleLo := Length(A) div 2;
if Odd(Length(A)) then
Result := A[MiddleLo + 1]
else
Result := (A[MiddleLo] + A[MiddleLo + 1]) / 2.0;
end;
function Median(const A: array of Integer): Extended; overload;
begin
Assert(Length(A) > 0);
// optimisations for array lengths 1 & 2 to avoid sorting
if Length(A) = 1 then
Exit(A[0]);
if Length(A) = 2 then
Exit((A[0] + A[1]) / 2);
Generics.Collections.TArray.Sort<Integer>(A); // using standard comparer
var MiddleLo := Length(A) div 2;
if Odd(Length(A)) then
Result := A[MiddleLo + 1]
else
Result := (A[MiddleLo] + A[MiddleLo + 1]) / 2;
end;
Mode for countable sets: ModeN
Obeys rule that if all items in a sequence of values are unique then there is no mode. The following function returns an empty array in that case. If sequence in mono-modal an array of one element containing the mode is returned, while if sequence is multi-modal, an array containing all the modes is returned.
This function illustrates the case for a sequence of Integer
values. Overloads for other ordinal types are obvious.
function ModeN(const A: array of Integer): TArray<Integer>;
begin
Assert(Length(A) > 0);
var MaxCount: Cardinal := 0;
var Counts := TDictionary<Integer,Cardinal>.Create;
try
for var X in A do
begin
var C: Cardinal;
if not Counts.TryGetValue(X, C) then
C := 0;
Inc(C);
Counts.AddOrSetValue(X, C);
if MaxCount < C then
MaxCount := C;
end;
// equivalent test?: if MaxCount = 1 then
if Length(A) = Counts.Count then
begin
// All numbers are unique => no mode
SetLength(Result, 0);
Exit;
end;
var Results := TList<Integer>.Create;
try
for KV in Counts do
begin
if KV.Value = MaxCount then
Results.Add(KV.Key);
end;
Result := Results.ToArray;
finally
Results.Free;
end;
finally
Counts.Free;
end;
end;
Mode for real numbers. Numbers are considered equal if they are within Epsilon
of each other. If Epsilon
is zero then Delphi chooses a suitable default.
function ModeR(const A: array of Extended; const Epsilon: Extended = 0.0): TArray<Extended>;
begin
Assert(Length(A) > 0);
var MaxCount: Cardinal := 0;
var Counts := TDictionary<Extended,Cardinal>.Create(
TDelegatedEqualityComparer.Create(
function (const L, R: Extended): Boolean
begin
Result := Math.SameValue(L, R, Epsilon);
end,
function (const V: Extended): Integer
begin
Result := TDelegatedEqualityComparer.Default.GetHashCode(V);
end
);
);
try
for var X in A do
begin
var C: Cardinal;
if not Counts.TryGetValue(X, C) then
C := 0;
Inc(C);
Counts.AddOrSetValue(X, C);
if MaxCount < C then
MaxCount := C;
end;
// equivalent test?: if MaxCount = 1 then
if Length(A) = Counts.Count then
begin
// All numbers are unique => no mode
SetLength(Result, 0);
Exit;
end;
var Results := TList<Extended>.Create;
try
for KV in Counts do
begin
if KV.Value = MaxCount then
Results.Add(KV.Key);
end;
Result := Results.ToArray;
finally
Results.Free;
end;
finally
Counts.Free;
end;
end;
Mode with probability function p(x) that gives weighted values for, say, integer x.
We define probability function as a closure:
P: TFunc<Integer,Extended>;
It may be useful to define a Mode function that returns the index/indices of one of the array elements that contain(s) the mode(s).
In the following we will use an array of Int64 and return the indices in an array ov Integerm just to make it easy to distinguish between the array type and the index type.
* ⚠️ check function below TODO:** reverse order of purpose of key/value in TPair used to store counter & index below - seems more logical if index, which is invariant, is labelled as Key & counter, which changes, is Value.
function ModeNByIndex(const A: array of Int64): TArray<Integer>;
begin
Assert(Length(A) > 0);
var MaxCount: Cardinal := 0;
var Counts := TDictionary<Int64,TPair<{Counter}Cardinal,{Index}Integer>>.Create;
try
for var Idx: Integer := Low(A) to High(A) do
begin
var V: TPair<Cardinal,Integer>;
if not Counts.TryGetValue(A[Idx], V) then
V := TPair<Cardinal,Integer>.Create(0, Idx);
Inc(V.Key);
Counts.AddOrSetValue(A[Idx], V);
if MaxCount < V.Key then
MaxCount := V.Key;
end;
if Length(A) = Counts.Count then
begin
// All numbers are unique => no mode
SetLength(Result, 0);
Exit;
end;
var Results := TList<Integer>.Create;
try
for KV in Counts do
if KV.Value.Key = MaxCount then
Results.Add(KV.Value.Value);
Result := Results.ToArray;
finally
Results.Free;
end;
finally
Counts.Free;
end;
end;
Add Mean, Median and Mode averages in various overloads of Int and Float.
All should operate on dynamic arrays.