Closed ETA444 closed 7 months ago
Implementation Summary:
The calculate_mahalanobis()
function calculates the Mahalanobis distance for an observation from a distribution, which is useful for identifying how far an observation is from the mean, considering covariance among variables.
Purpose:
The function's purpose is to compute the Mahalanobis distance, which measures how many standard deviations an observation is from the mean of a distribution, taking into account the correlations among variables.
Code Breakdown:
Purpose of the Function:
def calculate_mahalanobis(
x: Union[np.ndarray, pd.Series],
mean: np.ndarray,
inv_cov_matrix: np.ndarray
) -> float:
Parameter Definitions:
Parameters
----------
x : numpy.ndarray or pandas.Series
A 1D array of the observation or a single row from a DataFrame.
mean : numpy.ndarray
The mean vector of the distribution from which distances are calculated.
Must be 1D and of the same length as `x`.
inv_cov_matrix : numpy.ndarray
The inverse of the covariance matrix of the distribution. This matrix
must be square and its size should match the number of elements in `x`.
Return Definition:
Returns
-------
float
The Mahalanobis distance of the observation `x` from the distribution
defined by `mean` and `inv_cov_matrix`.
Raise Definitions:
Raises
------
ValueError
If `x` and `mean` do not have the same length.
LinAlgError
If the inverse covariance matrix is singular and cannot be used for
distance calculation.
Check Lengths of x
and mean
:
x
and mean
have the same length.if len(x) != len(mean):
raise ValueError("The observation and mean must have the same length.")
Calculate Mahalanobis Distance:
Purpose: To compute the Mahalanobis distance using the formula: $$(x - \mu)^T \Sigma^{-1} (x - \mu)$$
x_minus_mu = x - mean
try:
distance = np.dot(np.dot(x_minus_mu, inv_cov_matrix), x_minus_mu.T)
except np.linalg.LinAlgError:
raise np.linalg.LinAlgError("Singular matrix provided as inverse covariance matrix.")
Return Result:
return distance
Examples:
Examples
--------
>>> mean_vector = np.array([0, 0])
>>> observation = np.array([1, 1])
>>> cov_matrix = np.array([[1, 0.5], [0.5, 1]])
>>> inv_cov_matrix = np.linalg.inv(cov_matrix)
>>> calculate_mahalanobis(observation, mean_vector, inv_cov_matrix)
2.0
Notes:
Notes
-----
The Mahalanobis distance is widely used in outlier detection and cluster analysis.
It is scale-invariant and takes into account the correlations of the data set.
See the Full Function:
The full implementation can be found in the datasafari repository.
Description:
Method Functionality Idea:
The
calculate_mahalanobis
function calculates the Mahalanobis distance for an observation from a distribution. Used in explore_num method for outlier detection using this calculator.How it operates:
The Mahalanobis distance is a measure of the distance between a point and a distribution, considering the covariance among variables. This function computes the Mahalanobis distance of a single observation from the mean of a distribution, given the inverse of the covariance matrix of the distribution.
Parameters:
x
: numpy.ndarray or pandas.Series - A 1D array of the observation or a single row from a DataFrame.mean
: numpy.ndarray - The mean vector of the distribution from which distances are calculated. Must be 1D and of the same length asx
.inv_cov_matrix
: numpy.ndarray - The inverse of the covariance matrix of the distribution. This matrix must be square and its size should match the number of elements inx
.Returns:
float
- The Mahalanobis distance of the observationx
from the distribution defined bymean
andinv_cov_matrix
.Examples: