0todd0000 / spm1d

One-Dimensional Statistical Parametric Mapping in Python
GNU General Public License v3.0
61 stars 21 forks source link

Separate CCAs - Mulitple Comparions Corrections #152

Closed odandridge closed 3 years ago

odandridge commented 3 years ago

Hi Todd,

I hope you are well! I have a few questions regarding the use of CCA in relation to a cadaveric experiment looking a the effect of implant position on knee kinematics.

We hope to vary 5 implant positions (i.e. int/ext rotation), treating them as continuous variables, hence CCA not MANOVA (please correct me if wrong). Then we want to see the effect of varying these 5 IVs on both tibio- and patellofemoral joint kinematics in 6 DoFs, so 12 DVs.

I understand CCA in SPM1D currently can't accept multivariate regressors, so we would have to perform 5 separate CCAs.

My confusion is how and why to correct for multiple comparisons. For example, should I run the 5 CCAs, adjusting alpha for 5 comparisons, or is it more complicated than that due to the high number of DVs?

Carrying on, any information on how to correct the post-hoc tests for multiple comparisons would be useful too. If we find an effect at the multivariate stage, does alpha for the univariate post-hoc regressions need to be corrected for 12 DVs? It seems like the corrections might end up being too aggressive?

Thanks for your help, Oli

0todd0000 commented 3 years ago

Hi Oli,

I understand CCA in SPM1D currently can't accept multivariate regressors

Yes, this is correct. CCA sin spm1d currently does not support multivariate IVs.

It seems like the corrections might end up being too aggressive?

Yes, corrections generally end up being overly aggressive if you use a Bonferroni or similar correction, which doesn't consider the covariance amongst the DV's components.

Regarding the IVs and DVs:

Is this interpretation correct?

Are there multiple cadaveric specimens or just one?

If there are multiple specimens, are all 5 positions tested in all specimens?

Todd

odandridge commented 3 years ago

Sorry for not being clear. I reasoned that, as position is multivariate in general but not possible in SPM1D CCA, it should be separated into 5 univariate variables, for example, int/ext rotation, lat/med translation, varus, proximal, and anterior (or similar). These are ideally dealt with as continuous variables I think.

So, each CCA would be purposed with finding the correlation between each of these 5 DVs to the 12 measured IVs.

There will be multiple specimens, sample size will be around, if not exactly, 10.

All the best, Oli

0todd0000 commented 3 years ago

OK, the IV is clear now.

Do each of the specimens represent a single IV case? Or is each specimen tested over a set of pre-determined IV values?

odandridge commented 3 years ago

I am leaning towards the latter but am not totally sure to be honest!

We will look to varying each IV (position) over the same range in each knee if that's what you mean? For example, int/ext rotation of the femoral component might be set at 5 values, something like -6, -3, 0, 3, 6 deg. And similarly for the remaining 4 DVs.

I envisaged results sounding something like "In all knees, component internal rotation was significantly correlated with joint kinematics, in windows X, Y, Z. Post-hoc analysis revealed this was primarily due to a change in..."

I hope this makes sense! Oli

0todd0000 commented 3 years ago

Understood. The design sounds very complicated, and I suspect that a very large sample size (much larger than 10) will be necessary for these analyses, even if the data are 0D (i.e., a 12-component DV that does not vary in time).

If there is just a single specimen, then a number of IV values (e.g., -6 through +6 deg) could be controlled and the 12-component DV measured. This is relatively straightforward, but even for this case a large sample size would be necessary. As the number of DV components increases, the sample size required to achieve a specific power also increases, and with a 12 component DV and a 5-component IV, I suspect that the sample size will be very large, possibly 100 or more.

If you add specimens, then a number of complicating factors are introduced, the most important of which is the repeated-measures (RM) nature of the design, where a single specimen is measured in a number of conditions. The problem here is that the design is RM, with both a multivariate IV and multiple multivariate DVs. Standard CCA cannot be used for this purpose; analysis would likely require a general linear multivariate approach, and I am unsure what software would support these analyses. Possibly R, possibly SAS, but I'm not sure.

Regardless, spm1d does not support designs that are this complex, so it is difficult to give advice on how to proceed with spm1d.

I would recommend first considering the simplest case: one multivariate IV, and one 0D multivariate DV, and finding software that can be used to conduct a power analysis for this type of data. I suspect the required sample size would be 50-100, or even larger, depending on the size of the effect you wish to detect. I also suspect that it would be difficult to find software that supports the broader RM design, even for a 0D multivariate DV.

I would suggest simplifying the design as much as possible. Make the IV univariate if possible, and also limit the experiment to a non-RM design if possible.

odandridge commented 3 years ago

Hi Todd,

Thanks for all this info. Just wondering how would the sample size be able to be over 100 if there was only one specimen?

I suppose I was trying to follow roughly the method in this paper https://doi.org/10.1016/j.gaitpost.2016.04.014 but running the CCA multiple times accounting for each of our 5 DVs, instead of just their 1 DV (load magnitude).

Thanks, Oli

0todd0000 commented 3 years ago

The main problem is that large sample sizes are needed to accurately estimate covariance components.

Consider a single Gaussian random variable: we know that its variance is estimated relatively poorly when the sample size is small, and that this estimation improves up until around N=6 to N=10, then doesn't improve too much after that unless you have a moderately large sample size of N=100 or more. (Incidentally, this might be why a sample size of around N=10 is often used in the literature.) This is just one variable, and one variance component.

The N needed to achieve stable variance estimates increases as one adds variables. Not only are there multiple variances to estimate (one per variable), there are also multiple covariance components (one for each pair of variables).

For example, Gupta & Gupta (1987) show that thousands-to-tens-of-thousands of observations are required for the case of 60 variables. As another example, Baloğlu et al. (2018) used a sample size of N=505 for a 30-component DV.

If you search for papers that use CCA in the general literature, you'll find that many use N > 100, N > 1000 or even N > 10,000.



References

Baloğlu M, Kozan Hİ, Kesici Ş. Gender differences in and the relationships between social anxiety and problematic internet use: Canonical analysis. Journal of medical Internet research. 2018;20(1):e33.

Gupta PL, Gupta RD. Sample size determination in estimating a covariance matrix. Computational Statistics & Data Analysis. 1987 Jan 1;5(3):185-92.

odandridge commented 3 years ago

Okay I think I have a bit of a clearer picture, thanks very much for your help, as always!

Best wishes, Oli