delphidabbler / code-snippets

Collections of reusable code snippets, mainly in Pascal.
MIT License
27 stars 2 forks source link

Add various averaging routines to csdb #16

Open delphidabbler opened 1 year ago

delphidabbler commented 1 year ago

Add Mean, Median and Mode averages in various overloads of Int and Float.

All should operate on dynamic arrays.

delphidabbler commented 1 year ago

Median: middle value after sorting - if even number of elements then take mean of middle two => Median of integer array has to return float since mean of two integers could be fractional.

Mode: Result can be nothing if the is no value that has a greater number of instances then anything else; single value or multiple values if there is more than one value with same number of instances. Therefore Mode must return an array. Note that Mode can apply to any type, not just numbers.

delphidabbler commented 1 year ago

Note that Delphi's Math unit has a Mean function, but only for Float arrays:

function Mean(const Data: array of Single): Single;
function Mean(const Data: array of Double): Double;
function Mean(const Data: array of Extended): Extended;

So we could define Mean for integers that returns a Double. Two possible implementations:

function MeanInt(const A: array of Integer): Double;
begin
  Result := 0.0;
  for var I := Low(A) to High(A) do
    Result := Result + A[I] / Length(A);
end;

function MeanInt2(const A: array of Integer): Double;
begin
  var Sum: Int64 := 0;
  for var Elem in A do
    Inc(Sum, Elem);
  Result := Sum / Length(A);
end;
delphidabbler commented 1 year ago

Also, there is geometric mean:

The geometric mean, sometimes also called geometric average, is an average calculated by multiplying a set of positive values and taking the nth root, where n is the number of values.

The geometric mean is used to minimize the effects of extreme values; for instance, when averaging growth rates.

--- Source: Eurostat

Geometric mean only works if all numbers are > 0.

Two formula (source Wikipedia):

  1. For N +ve numbers a1, a2, ... aN, GeoMean = Nth root of (a1 × a2 × ... aN).
  2. For N +ve numbers a1, a2, ... aN, GeoMean = exp(sum(ln(a1)), ln(a2), ... ĺn(aN)) / N).

Note the case 1 will overflow easily, so case 2 seems best algorithm.

E.g.:

function GeoMean(const A: array of Double): Double;
begin
  Assert(Length(A) > 0);
  var SumOfLogs: Double := 0.0;
  for var D in A do
  begin
    if Sign(D) <> PositiveValue then
      raise EArgumentOutOfRangeException.Create('Non positive value passed to GeoMean');
    SumOfLogs := SumOfLogs + Ln(A);
  end;
  Result := Exp(SumOfLogs / Length(A));
end;
delphidabbler commented 1 year ago

And there's the logarithmic mean of two +ve numbers (not same as log-mean), which is defined as:

Where x = y:

Mlm(x, y) = x

Where x ≠ y:

Mlm(x, y) = (y - x) / (ln(y) - ln(x))

So:

function LogarithmicMean(const X, Y: Double): Double;
begin
  Assert((X > 0) and (Y > 0));
  if SameValue(X, Y) then
    Result := X
  else
    Result := (Y - X) / (Ln(Y) - Ln(X));
end;
delphidabbler commented 1 year ago

Generalised Mean / Power Mean:

A power mean is a mean of the form

image

where all x are >= 0.

Source: Statistics How To & Wolfram MathWorld

function PowerMean(const A: array of Extended; Lambda: Extended): Extended;
begin
  Assert(not IsZero(Lambda));
  Assert(Length(A) > 0);
  var Sum: Extended := 0.0;
  for var X in A do
  begin
    Assert(Sign(X) <> NegativeValue);
    Sum := Sum + Power(X, Lambda);
  end;
  Result := Power(Sum / Length(A), 1 / Lambda);
end;
delphidabbler commented 1 year ago

Weighted Generalised Mean / Power Mean

Screenshot_20230719-204835_Chrome~2

When all weights w sum to 1 then denominator is 1 and can be ignored.

Soure: Statistics How To and Wikipedia

delphidabbler commented 1 year ago

Weighted geometric mean

Screenshot_20230719-205701_Chrome

Source: Wikipedia

delphidabbler commented 1 year ago

Weighted arithmetic mean

Screenshot_20230719-210238_Chrome

Source: Wikipedia

delphidabbler commented 1 year ago

Useful helper functions for checking arrays for zero & negative entries:


// Check if all elements of a non-empty array are zero 
function ArrayIsZero(const A: array of Extended): Boolean;
begin
  Assert(Length(A) > 0);
  Result := False;
  for var Elem in A do
    if not IsZero(Elem) then
      Exit;
  Result := True;
end;

// Check if all elements of a non-empty array are <> 0 
function ArrayIsNonZero(const A: array of Extended): Boolean;
begin
  Assert(Length(A) > 0);
  Result := False;
  for var Elem in A do
    if IsZero(Elem) then
      Exit;
  Result := True;
end;

// Check if all elements of a non-empty array are > 0 
function ArrayIsPositive(const A: array of Extended): Boolean;
begin
  Assert(Length(A) > 0);
  Result := False;
  for var Elem in A do
    if Sign(Elem) <> PositiveValue then
      Exit;
  Result := True;
end;

// Check if all elements of a non-empty array are < 0 
function ArrayIsNegative(const A: array of Extended): Boolean;
begin
  Assert(Length(A) > 0);
  Result := False;
  for var Elem in A do
    if Sign(Elem) <> PositiveValue then
      Exit;
  Result := True;
end;

// Check if all elements of a non-empty array are <= 0 
function ArrayIsNonPositive(const A: array of Extended): Boolean;
begin
  Assert(Length(A) > 0);
  Result := False;
  for var Elem in A do
    if Sign(Elem) <> PositiveValue then
      Exit;
  Result := True;
end;

// Check if all elements of a non-empty array are >= 0 
function ArrayIsNonNegative(const A: array of Extended): Boolean;
begin
  Assert(Length(A) > 0);
  Result := False;
  for var Elem in A do
    if Sign(Elem) = NegativeValue then
      Exit;
  Result := True;
end;
delphidabbler commented 1 year ago

A related (helper) function is LogSumExp or LSE for a sequence x1,...,xn is is defined as the logarithm of the sum of the exponentials of the arguments:

LSE(x1,...,xn) = log(exp(x1) + ... + exp(xn))

Source: Wikipedia

function LSE(const A: array of Extended): Extended;
begin
  var Sum: Extended := 0.0;
  for var X in A do
    Sum := Sum + Exp(X):
  Result := Ln(Sum);
end;
delphidabbler commented 1 year ago

Strictly convex LogSumExp

Screenshot_20230720-040354_Chrome

Source: Wikipedia

Since exp(0) = e^0 = 1, we have

LSE(0, x1,...,xn) = log(exp(0) + exp(x1) + ... + exp(xn))
                  = log(1 + exp(x1) + ... + exp(xn)),

So, in Pascal we can have:

function LSEPlus0(const A: array of Extended): Extended;
begin
  var Sum: Extended := 1.0; // for additional Exp(0) term
  for var X in A do
    Sum := Sum + Exp(X):
  Result := Ln(Sum);
end;
delphidabbler commented 1 year ago

Quasi-arithmetic mean

Screenshot_20230720-042100_Chrome

Source: Wikipedia

Note that certain functions f may be restricted to certain intervals of the real numbers, so if would be wise to pass a closure to the Pascal function that ensures that the passed array has elements that fall within the interval, e.g.

function QAMean(
  const A: array of Extended; 
  F, FInv: TFunc<Extended,Extended>; 
  Constraint: TPredicate<Extended>
): Extended;

We could let Constraint be optional and default to nil in which case no checking would be done (I.e. any real number would be valid).

delphidabbler commented 1 year ago

Harmonic mean

Screenshot_20230720-042913_Chrome

Source: Wikipedia

delphidabbler commented 1 year ago

Median

Two overloads suggested, one real & one integer:

function Median(const A: array of Extended): Extended; overload;
begin
  Assert(Length(A) > 0);
  // optimisations for array lengths 1 & 2 to avoid sorting
  if Length(A) = 1 then
    Exit(A[0]);
  if Length(A) = 2 then
    Exit((A[0] + A[1]) / 2.0);
  Generics.Collections.TArray.Sort<Extended>(A); // using standard comparer
  var MiddleLo := Length(A) div 2;
  if Odd(Length(A)) then
    Result := A[MiddleLo + 1]
  else
    Result := (A[MiddleLo] + A[MiddleLo + 1]) / 2.0;
end;

function Median(const A: array of Integer): Extended; overload;
begin
  Assert(Length(A) > 0);
  // optimisations for array lengths 1 & 2 to avoid sorting
  if Length(A) = 1 then
    Exit(A[0]);
  if Length(A) = 2 then
    Exit((A[0] + A[1]) / 2);
  Generics.Collections.TArray.Sort<Integer>(A); // using standard comparer
  var MiddleLo := Length(A) div 2;
  if Odd(Length(A)) then
    Result := A[MiddleLo + 1]
  else
    Result := (A[MiddleLo] + A[MiddleLo + 1]) / 2;
end;
delphidabbler commented 1 year ago

Mode for countable sets: ModeN

Obeys rule that if all items in a sequence of values are unique then there is no mode. The following function returns an empty array in that case. If sequence in mono-modal an array of one element containing the mode is returned, while if sequence is multi-modal, an array containing all the modes is returned.

This function illustrates the case for a sequence of Integer values. Overloads for other ordinal types are obvious.

function ModeN(const A: array of Integer): TArray<Integer>;
begin
  Assert(Length(A) > 0);
  var MaxCount: Cardinal := 0;
  var Counts := TDictionary<Integer,Cardinal>.Create;
  try
    for var X in A do
    begin
      var C: Cardinal;
      if not Counts.TryGetValue(X, C) then
        C := 0;
      Inc(C);
      Counts.AddOrSetValue(X, C);
      if MaxCount < C then
        MaxCount := C;
    end;
    // equivalent test?: if MaxCount = 1 then
    if Length(A) = Counts.Count then 
    begin
      // All numbers are unique => no mode
      SetLength(Result, 0);
      Exit;
    end;
    var Results := TList<Integer>.Create;
    try
      for KV in Counts do
      begin
        if KV.Value = MaxCount then
          Results.Add(KV.Key);
      end;
      Result := Results.ToArray;
    finally
      Results.Free;
    end;
  finally
    Counts.Free;
  end;
end;
delphidabbler commented 1 year ago

Mode for real numbers. Numbers are considered equal if they are within Epsilon of each other. If Epsilon is zero then Delphi chooses a suitable default.

function ModeR(const A: array of Extended; const Epsilon: Extended = 0.0): TArray<Extended>;
begin
  Assert(Length(A) > 0);
  var MaxCount: Cardinal := 0;
  var Counts := TDictionary<Extended,Cardinal>.Create(
    TDelegatedEqualityComparer.Create(
      function (const L, R: Extended): Boolean
      begin
        Result := Math.SameValue(L, R, Epsilon);
      end,
      function (const V: Extended): Integer
      begin
        Result := TDelegatedEqualityComparer.Default.GetHashCode(V);
      end
    );
  );
  try
    for var X in A do
    begin
      var C: Cardinal;
      if not Counts.TryGetValue(X, C) then
        C := 0;
      Inc(C);
      Counts.AddOrSetValue(X, C);
      if MaxCount < C then
        MaxCount := C;
    end;
    // equivalent test?: if MaxCount = 1 then
    if Length(A) = Counts.Count then 
    begin
      // All numbers are unique => no mode
      SetLength(Result, 0);
      Exit;
    end;
    var Results := TList<Extended>.Create;
    try
      for KV in Counts do
      begin
        if KV.Value = MaxCount then
          Results.Add(KV.Key);
      end;
      Result := Results.ToArray;
    finally
      Results.Free;
    end;
  finally
    Counts.Free;
  end;
end;
delphidabbler commented 1 year ago

Mode with probability function p(x) that gives weighted values for, say, integer x.

We define probability function as a closure:

P: TFunc<Integer,Extended>;
delphidabbler commented 1 year ago

It may be useful to define a Mode function that returns the index/indices of one of the array elements that contain(s) the mode(s).

In the following we will use an array of Int64 and return the indices in an array ov Integerm just to make it easy to distinguish between the array type and the index type.

* ⚠️ check function below TODO:** reverse order of purpose of key/value in TPair used to store counter & index below - seems more logical if index, which is invariant, is labelled as Key & counter, which changes, is Value.


function ModeNByIndex(const A: array of Int64): TArray<Integer>;
begin
  Assert(Length(A) > 0);
  var MaxCount: Cardinal := 0;
  var Counts := TDictionary<Int64,TPair<{Counter}Cardinal,{Index}Integer>>.Create;
  try
    for var Idx: Integer := Low(A) to High(A) do
    begin
      var V: TPair<Cardinal,Integer>;
      if not Counts.TryGetValue(A[Idx], V) then
        V := TPair<Cardinal,Integer>.Create(0, Idx);
      Inc(V.Key);
      Counts.AddOrSetValue(A[Idx], V);
      if MaxCount < V.Key then
        MaxCount := V.Key;
    end;
    if Length(A) = Counts.Count then 
    begin
      // All numbers are unique => no mode
      SetLength(Result, 0);
      Exit;
    end;
    var Results := TList<Integer>.Create;
    try
      for KV in Counts do
        if KV.Value.Key = MaxCount then
          Results.Add(KV.Value.Value);
      Result := Results.ToArray;
    finally
      Results.Free;
    end;
  finally
    Counts.Free;
  end;
end;