andersonwinkler / PermCCA

Permutation inference for Canonical Correlation Analysis
GNU General Public License v3.0
21 stars 6 forks source link

Possible inconsistencies in permcca w.r.t. NeuroImage article #7

Closed jrasero closed 7 months ago

jrasero commented 7 months ago

Hi Anderson,

Is this repository still supported? I may have found a few inconsistencies in the permcca function w.r.t. the algorithm description in https://doi.org/10.1016/j.neuroimage.2020.117065, but I don't want to post them here if I am going to get no response.

Thanks!

andersonwinkler commented 7 months ago

Hi Javier,

Yes, it's maintained!

All the best,

Anderson

jrasero commented 7 months ago

Excellent!!

So, according to line 19 in Algorithm 1 of your paper, the CCA should run on the residualised version of X and Y.

If I am not mistaken, this residualisation in permcca happens in lines 99 Y = Qz'*Y; and 116 X = Qw'*X;.

But the, when running the initial CCA, both Y and X are multiplied by Qz and Qw, maybe undoing the previous residualisations (line 121)?

% Initial CCA
[A,B,r] = cca(Qz*Y,Qw*X,R,S);

Obviously, this does not affect the scenario with no nuisance variables, but I just wanted to check if this was OK or not. I am currently working on a Python translation of the permcca function.

By the way, congrats on the paper and this wonderful tool! I love them!

andersonwinkler commented 7 months ago

Hi Javier,

It is true that Q = pinv(Q'), though that does add the nuisance back. The role of that pre-multiplication by Q is to bring the number of rows back to original. The variables, after pre-multiplication by Q, are still residualized.

To see this in another way, the code does Y = Qz'*Y, and then as you noted, Y = Qz*Y, so what we really have is Y = Qz*Qz'*Y. But Qz*Qz' is the same as Rz, the residual-forming matrix due to Z, such that Y = Rz*Y across those two lines, i.e., we are residualizing Y. Same goes for X using Qz.

Now, which one is "the" residualized? What's in line 99 or what's in line 121? Both are. The only difference is that the one in line 99 has fewer rows (depending on Z), whereas the one in line 121 fed into the cca function has the same number of rows as the original.

If we were to add back the nuisance (as we do in other cases, e.g., with Freedman-Lane or ter Braak permutation methods for the GLM), we'd multiply by H = I - R, where R is the residual-forming matrix and I is the identity.

Hope this helps!

All the best,

Anderson

jrasero commented 7 months ago

Oh I see, now I understand. Actually, this seems to be well explained in the paper (section 2.6), but I didn't catch it. So apologies for opening the issue, and let me restate again my congratulations for such a nice work! I am closing this issue now.