ancova - Githubissues

0todd0000 / spm1d

One-Dimensional Statistical Parametric Mapping in Python

GNU General Public License v3.0

61 stars 21 forks source link

ancova #264

Open ZofTG opened 1 year ago

ZofTG commented 1 year ago

Hi Todd,

I'm investigating the effect of two variables on a 1D outcome. The models is rather simple and can be generalized to:

Y = WITHIN_SUBJECTS_FACTOR + COVARIATE + ERROR

As far as I know, ANCOVA is not directly supported by the spm1d package, however I'm wondering if the following stepwise approach could be a reasonable workaround:

fit_covariate = spm1d.stats.regress(OUTCOMES, COVARIATE).inference(alpha=0.05, two_tailed=True)
RESIDUALS = fit_covariate.residuals
fit_factor = spm1d.stats.anova1rm(RESIDUALS, FACTOR, SUBJECTS).inference(alpha=0.05, two_tailed=True)

Thank you in advance for your support. In case you have any better option you would like to share, it would be more than welcome.

Best regards, Luca.

0todd0000 commented 1 year ago

This stepwise approach can indeed be used, but note that in this approach COVARIATE is between-subjects. If you are intending a within-subjects model then it may be more appropriate to apply the regress approach to each subject individually (because each subject's response to COVARIATE may generally be different), and then to pool the residuals for a second-level analysis.

As a minor point, inference is unneeded for the first-level analysis; to speed up the calculation a bit you can just do:

fit_covariate = spm1d.stats.regress(OUTCOMES, COVARIATE)
RESIDUALS = fit_covariate.residuals

ZofTG commented 1 year ago

Hi Todd,

thank you very much for your quick reply. Indeed my intent is to consider the covariate as a between-subjects variable, but it is indeed true that such approach reduces the predictive ability of the model (as you pointed out). To this extent, the approach you proposed (i.e. applying the regression in a within-subjects manner) sounds very similar to a linear mixed model with random slope and intercept for COVARIATE on SUBJECTS. Am I right or I do miss something?

In addition thank you also for the performance tip about the avoidance of the inference step. This is indeed useful to speed-up the whole computation.

Best regards, Luca.

0todd0000 commented 1 year ago

the approach you proposed (i.e. applying the regression in a within-subjects manner) sounds very similar to a linear mixed model with random slope and intercept for COVARIATE on SUBJECTS. Am I right or I do miss something?

Correct! You can also optionally apply a more flexible first-level model using spm1d.stats.glm whose use is demonstrated here. Using glm you can specify arbitrary regression models including for example: removing the intercept and using multiple covariates. You can also use glm for ANCOVA, but currently only for t contrasts (and not for F contrasts). The next major release of spm1d (v.0.5) will also support F-like contrasts for arbitrary ANCOVA.

ZofTG commented 1 year ago

Dear Todd,

Thank you very much for the suggestion. The glm approach won't be necessary this time, but it surely will in the future.

Just a note about 2 possible "bugs" that I encountered using the spm1d.stats.nonparam.regress function. The error comes out on the line 211 of the spm1d.stats.nonparam.permuters.py.

self.nPermTotal = int( factorial( self.J ) )

It is caused by the casting to int of the factorial function output. The error can be easily reproduced as in case inf is returned, then the cast to int will raise an error. Looking at the code, I would suggest to consider the following change:

self.nPermTotal    = factorial( self.J )
if self.nPermTotal == np.inf:    self.nPermTotal = -1

Secondly, using the same spm1d.stats.nonparam.regress function, the resulting object seems to not include neither the beta nor the residuals attributes. If this is a desired behaviour please ignore this comment, but I would expect that these two attributes should be available both for the parametric and the nonparametric versions of the regress function. Would it be possible?

Best regards, Luca.

0todd0000 commented 10 months ago

Thank you for reporting this bug, and sorry for the delay! I missed your post in my inbox, I'm very sorry about that!

I understand the bug but I don't think this solution will work:

self.nPermTotal = -1

because this will yield incorrect booleans like:

a = nPerm > nPermTotal

where nPerm is the user-specified number permutations. So I think the solution needs to be a bit more comprehensive with checks for downstream logical error possibilities.

I agree that the bug should be fixed. Do you have any other suggestions?

Secondly, using the same spm1d.stats.nonparam.regress function, the resulting object seems to not include neither the beta nor the residuals attributes. If this is a desired behaviour please ignore this comment, but I would expect that these two attributes should be available both for the parametric and the nonparametric versions of the regress function. Would it be possible?

Yes, this is possible, but please note:

These attributes will indeed be available in the next major spm1d release (version 0.5)
You can obtain these values from the parametric results (i.e., when using spm1d.stats.regress)

Is this satisfactory?

ZofTG commented 10 months ago

Dear Todd,

Thank you for your reply. Including the beta and the residuals attributes on the 0.5 version would be perfect. In the meantime I had indeed used the parametric version of the regress function to extract them.

About the nPermTotal issue, I tried the following code to see if a workaround was possible using the gamma function:

from scipy.special import factorial, gamma

valid = 170
print(f"{valid}! = {gamma(valid + 1)} (gamma) {factorial(valid)} (factorial)")

invalid = 171
print(f"{invalid}! = {gamma(invalid + 1)} (gamma) {factorial(invalid)} (factorial)")

unfortunately, I discovered that the factorial function internally already uses the gamma function, therefore I do not see a clean solution to the problem. Nevertheless, in order to avoid inf values, I would like to suggest the use of the maximum possible interger value in case J > 170 (which, according to the code above, is the highest integer not returning inf values). Thus the code would become:

import sys

if self.J > 170:
    self.nPermTotal    = sys.maxsize
else:
    self.nPermTotal    = factorial( self.J )

I don't know if this solution could be acceptable to you, but please let me know.

Best regards, Luca