Implement new calculator util: calculate_mahalanobis()

Description:

Method Functionality Idea:

The calculate_mahalanobis function calculates the Mahalanobis distance for an observation from a distribution. Used in explore_num method for outlier detection using this calculator.

How it operates:

The Mahalanobis distance is a measure of the distance between a point and a distribution, considering the covariance among variables. This function computes the Mahalanobis distance of a single observation from the mean of a distribution, given the inverse of the covariance matrix of the distribution.

Parameters:

x : numpy.ndarray or pandas.Series - A 1D array of the observation or a single row from a DataFrame.
mean : numpy.ndarray - The mean vector of the distribution from which distances are calculated. Must be 1D and of the same length as x.
inv_cov_matrix : numpy.ndarray - The inverse of the covariance matrix of the distribution. This matrix must be square and its size should match the number of elements in x.

Returns:

float - The Mahalanobis distance of the observation x from the distribution defined by mean and inv_cov_matrix.

Examples:

mean_vector = np.array([0, 0])
observation = np.array([1, 1])
cov_matrix = np.array([[1, 0.5], [0.5, 1]])
inv_cov_matrix = np.linalg.inv(cov_matrix)
calculate_mahalanobis(observation, mean_vector, inv_cov_matrix)
# Output: 2.0

Implementation Summary:

The calculate_mahalanobis() function calculates the Mahalanobis distance for an observation from a distribution, which is useful for identifying how far an observation is from the mean, considering covariance among variables.

Purpose:

The function's purpose is to compute the Mahalanobis distance, which measures how many standard deviations an observation is from the mean of a distribution, taking into account the correlations among variables.

Code Breakdown:

Purpose of the Function:
- Purpose: To calculate the Mahalanobis distance for an observation from a distribution.
```
def calculate_mahalanobis(
   x: Union[np.ndarray, pd.Series],
   mean: np.ndarray,
   inv_cov_matrix: np.ndarray
) -> float:
```
- The Mahalanobis distance is effective for determining how far an observation is from the mean of a distribution, considering the covariance among variables.

Parameter Definitions:

Purpose: To define the function's parameters.

Parameters
----------
x : numpy.ndarray or pandas.Series
   A 1D array of the observation or a single row from a DataFrame.
mean : numpy.ndarray
   The mean vector of the distribution from which distances are calculated.
   Must be 1D and of the same length as `x`.
inv_cov_matrix : numpy.ndarray
   The inverse of the covariance matrix of the distribution. This matrix
   must be square and its size should match the number of elements in `x`.

Return Definition:

Purpose: To define the function's return type.

Returns
-------
float
   The Mahalanobis distance of the observation `x` from the distribution
   defined by `mean` and `inv_cov_matrix`.

Raise Definitions:

Purpose: To define the exceptions the function can raise.

Raises
------
ValueError
   If `x` and `mean` do not have the same length.
LinAlgError
   If the inverse covariance matrix is singular and cannot be used for
   distance calculation.

Check Lengths of x and mean:

Purpose: To ensure x and mean have the same length.

if len(x) != len(mean):
   raise ValueError("The observation and mean must have the same length.")

Calculate Mahalanobis Distance:

Purpose: To compute the Mahalanobis distance using the formula: $$(x - \mu)^T \Sigma^{-1} (x - \mu)$$

x_minus_mu = x - mean
try:
   distance = np.dot(np.dot(x_minus_mu, inv_cov_matrix), x_minus_mu.T)
except np.linalg.LinAlgError:
   raise np.linalg.LinAlgError("Singular matrix provided as inverse covariance matrix.")

Return Result:
- Purpose: To return the computed distance.
```
return distance
```

Examples:

Purpose: To provide examples of how to use the function.

Examples
--------
>>> mean_vector = np.array([0, 0])
>>> observation = np.array([1, 1])
>>> cov_matrix = np.array([[1, 0.5], [0.5, 1]])
>>> inv_cov_matrix = np.linalg.inv(cov_matrix)
>>> calculate_mahalanobis(observation, mean_vector, inv_cov_matrix)
2.0

Notes:

Purpose: To provide additional context and applications for the function.

Notes
-----
The Mahalanobis distance is widely used in outlier detection and cluster analysis.
It is scale-invariant and takes into account the correlations of the data set.

See the Full Function:

The full implementation can be found in the datasafari repository.

ETA444 / datasafari