CIRDLES / Tripoli

Tripoli imports raw mass spectrometer data files and supports interactive review and archiving of isotopic data. Tripoli facilitates visualization of temporal trends and scatter during measurement, statistically rigorous filtering of data, and calculation of statistical parameters.
http://cirdles.org/Tripoli/
Apache License 2.0
8 stars 12 forks source link

Calculate weighted mean across blocks #185

Closed noahmclean closed 8 months ago

noahmclean commented 8 months ago

In the analysis ratios tab of the Tripoli MCMC results (and likely several other places in Tripoli in the future), calculate the weighted mean of the block results. Here's a description of a weighted mean function for use in Tripoli:

First, the data -- values and uncertainties -- going into the weighted mean function should not be a ratio. Only take weighted means of log-ratios and only use the uncertainties calculated for those log-ratios for the weighting.

Inputs

There are two inputs to the weighted mean calculation.

Input one is a set of values arranged in a column vector. For instance, an analysis with ten blocks would contain ten values arranged in ten rows and one column. The first value would be the mean of the (e.g., 100,000) trials generated by MCMC for a particular log ratio in the first block. The second value would be the mean of the (e.g., 50,000) trials generated by MCMC for the same log ratio for the second block, and so on. Call this vector $\mathbf{x}$.

Input two could be provided to the weighted mean function as either a vector or a matrix. For instance, an analysis with ten blocks would have a vector with ten rows and one column or a matrix with ten rows and columns. The vector option would contain the one-sigma uncertainties in the log-ratios, squared. To calculate the first element of the vector, take the standard deviation of the (e.g., 100,000) saved MCMC trials for the log ratio in the first block, then square it. The second element of the vector would be the squared standard deviation of the MCMC trials for the log ratio from the second block, etc.

Input two could also be a covariance matrix. We won't use these (yet) for the weighted mean of log ratios for blocks, but we might use it elsewhere down the road.

Outputs

There are three outputs from the weighted mean calculation, all scalars. The first output is the weighted mean, the second is its one-sigma absolute uncertainty, and the third is a reduced chi-square statistic.

Calculation

  1. If the second input is a vector of squared standard deviations, then convert it into a covariance matrix. Arrange the $n$ elements of the squared standard deviation vector along the diagonal of an $n \times n$ matrix, with all off-diagonal elements zero. Call this matrix $\mathbf{\Sigma}$.

  2. Create a $n \times 1$ vector of ones, called $\mathbf{1}$.

  3. The formula for the weighted mean $\bar{x}$ is

$$ \bar{x} = \dfrac{\mathbf{1}^T \mathbf{\Sigma}^{-1} \mathbf{x}}{\mathbf{1}^T \mathbf{\Sigma}^{-1} \mathbf{1}} $$

  1. The formula for the weighted mean uncertainty (one sigma abs) is

$$ \sigma_{\bar{x}} = \sqrt{ \dfrac{1}{\mathbf{1}^T \mathbf{\Sigma}^{-1} \mathbf{1}} }$$

  1. Calculate the vector of residuals $\mathbf{r}$ by subtracting the weighted mean $\bar{x}$ from measured values $x$: $r = \mathbf{x} - \bar{x}$.

  2. The formula for the reduced chi-square statistic for the weighted mean is

$$ \chi^2_\nu = \dfrac{1}{n-1} \mathbf{r}^T \mathbf{\Sigma}^{-1} \mathbf{r} $$

Note

Don't evaluate the matrix inverse directly to calculate $\mathbf{\Sigma}^{-1}$. Instead, use a linear equation solver like one from LINPACK. We figured out a Java implementation for this in Redux, let me know if you need help finding the same linear algebra functions/library.

bowring commented 8 months ago

Implemented in v0.3.8 - see new class WeighteMeanOfLogRatio.