aaronpeikert commented 1 year ago

64

nickhaf commented 5 months ago

The Problem

The goal of Taxonomy is to obtain a sample of actually used models to build simulations on. In this context it is important to know if we are working with standardized or unstandardized parameters, because they provide different and mutually exclusive information:

Unstandardized coefficients inform about the variance and the mean and are useful for comparing across models which were fit to the same variables using different sets of data.
Standardized loadings on the other hand inform about the the relative influence of a variable on another. They are most commonly calculated by scaling by the sample standard deviations, here shown for loadings $\lambda$:

\hat{\lambda}^s_{ij} = \hat{\lambda}_{ij}(\frac{\hat{\sigma}^2_{jj}}{\hat{\sigma}^2_{ii}})^{1/2}

with:

superscript $s$ representing a standardized coefficient
$i$ as the influenced dependent variable
$j$ as the explanatory independent variable
$\hat{\sigma}^2_{ii}$ and $\hat{\sigma}^2_{jj}$ are the model-predicted/model-implied variances of the $i$th and $j$th variables.

It is possible to standardize all paramters (more common), or only the latent variables (less common).

Optimally, we would be able to standardize paramters by ourselves, but it can happen that the model implied variances are not reported. Also, it does not always seem to be clear whether the loadings have been standardized or not.

Open Questions

How to deal with cases where we are not sure whether some part of the model was standardized or not? I did some quick scan of some papers and did not encounter the problem, but it might still arise.
How to deal with our mixed sample of Standardized and Unstandardized paramters?

brandmaier commented 5 months ago

I recommend computing the model-implied covariance matrix from a given model. If this covariance matrix has a unit diagonal (up to some slack because of numerical imprecision), I guess we can assume that factor loadings and regressions and covariances were standardized. Usually, the model-implied matrix is only computed for observed variables but for this test, one should compute the covariance matrix between all latent and all observed variables.

aaronpeikert commented 4 months ago

We decided to assume everything is standardized. This means we have to recode all records that a standardized at the moment to check if we coded the raw or standardized stuff.

lkosanke commented 4 months ago

Todos:

implement new judgement Unstandardized(true), that is given if only unstandardized loadings are reported, and this is clearly stated.
Go through all papers with Standardized(true) and look if both unstandardized and standardized loadings have been reported. In these cases, we need recode the records to contain the standardized loadings.
Go through all papers with Standardized(missing) (for Valentin) and Standardized(false) (for Leo) and see if only explicitly unstandardized loadings are reported. If so, give Unstandardized(true).
Papers with Standardized(missing) can be ignored (for Leo), as we now assume everything to be reported as standardized.
Delete judgment Standardized() and all its instances

formal-methods-mpi / Taxonomy.jl

How to handle standardization #86

64

The Problem

Open Questions