Open yanbin-niu opened 3 years ago
@yanbin-niu this is well explained and a great question.
As far as I can tell, yes, get_sources will indeed give you the activation matrix
For the mixing matrix, is there not something called "mixing_matrix" that is returned when ica I run originally? That is, instead if needing to run "get_components" , is it possible that after first running ica a mixing_matrix is returned, and if so, what are the dimensions of this matrix?
(Closed by accident; in mobile...) Sorry!
@georgebuzzell we get those five matrices after running ica: pre_whitener_, pca_components_, pca_mean_, mixing_matrix_, unmixing_matrix_
. get_components
function returns "np.dot(pcacomponents, mixingmatrix)". So, get_components
function actually does nothing new. And if it helps, mixing_matrix_
is the inverse matrix of unmixing_matrix_
.
EEGLAB can optionally also perform PCA prior to running the ICA, though this option is off by default in EEGLAB, but on by default in MNE. I think some of the confusion when comparing EEGLAB and MNE is around how that PCA step is integrated.
I believe in EEGLAB, the icawinv mixing matrix is always the number of channels X number of ICA components, while in MNE, the ica.mixingmatrix is the number of PCA components X number of ICA components (or possibly the reverse order). So for example if you have 64 channels but use PCA to reduce to 32 PCA components, icawinv in EEGLAB would be 64x32, but ica.mixingmatrix in MNE would be 32x32. But, after running ica.get_components() in MNE, the PCA matrix is incorporated so the result is 64x32. So from this, I think the result of get_components() is the closest to the icawinv as @yanbin-niu mentioned.
For whether the PCA mean and pre_whitener need to be incorporated, I think we'd have to more probing into how both EEGLAB and MNE do the whole process, and what ADJUST is using the matrix for.
@yanbin-niu you mentioned that:
I multiply the matrix of ICA.get_components() by the matrix of ICA.get_sources(), I can not get the raw data
I'm not 100% sure, but I wonder if those steps might not incorporate the left-over PCA components that were not included in the ICA procedure, which then need to be added back to the data again? I think this is the portion of the code that is used when the data is reassembled after removing an independent component, but I would have to dig through it some more: https://github.com/mne-tools/mne-python/blob/51f19b5a16e900d0b1944af16255aaa677069e60/mne/preprocessing/ica.py#L1691
Hey @yanbin-niu , just continuing some of the previous discussion from https://github.com/NDCLab/pepper-pipeline/pull/280 here:
As you mentioned, the steps for ICA in MNE are:
I believe the PCA step is technically PCA whitening, since in addition to de-correlating the channels, the values are scaled: https://github.com/mne-tools/mne-python/blob/51f19b5a16e900d0b1944af16255aaa677069e60/mne/preprocessing/ica.py#L706
You mentioned that Coz the code you sent early on -- the _pick_sources method, also reverts the standardization at the end.
however I believe that is just because that is reverting back in to channel space, so the steps are reversed; ICA space -> PCA space -> standardized channel space -> channel space.
It was mentioned on the closed Pull request that Now, I am thinking, maybe the better way to get the activation matrix is to: revert the standardization of ica.get_sources(raw)?
However I think the standardization is just applied there to get into the correct 'space' for applying the PCA and ICA, ie the data has to be pre_whitened in the same way as when fitting the PCA before the unmixing matrix can be applied to get to the IC timeseries. I believe this is where the data is actually transformed from channel to IC space to get the activations: https://github.com/mne-tools/mne-python/blob/51f19b5a16e900d0b1944af16255aaa677069e60/mne/preprocessing/ica.py#L812
My current sense is that using ica.get_components() and ica.get_sources() should be okay without trying to integrate the pre_whitening, though we can probably dig into the EEGLAB code a bit more to understand what is happening there to be sure. I believe that EEGLAB also standardizes the channel values prior to ICA, but it may be incorporated into the sphering step.
Hey @DMRoberts , thanks for the comment. Now my sense is also that using ica.get_components() and ica.get_sources() should be good.
@georgebuzzell and I met yesterday and he pointed out some outstanding issues --
However, the outstanding issue is to confirm that the same/similar activation and transformation matrices can be generated in mne-python. Towards this end, @yanbin-niu will creat a new issue, then PR, to demonstrate that when the same ICA algorithem (infomax) is used in EEGLAB and MNE, and the random seed is set to be the same, then the same/similar activation and transformation matrices are generated in mne-python as eeglab. Note that identical activation and transformation matrices may not be generated due to rounding errors. However, the confirmation of "good enough" is that plots of each looks very similar. Additionally, running the mne-adjust code on the activation and transformation matrices generated by mne should yield identification of the same bad components as those identified by running ica and adjust in eeglab.
To solve this issue. I did some tests and also got some questions:
Therefore, there are some differences in two processes. For example, MNE does “z-standardized”. and EEGLAB just removes the mean. So, @DMRoberts do you think two activation matrices and mixing matrices are really comparable?
random_state
to 0; Also in EEGLab, runica
can set a random state to 0 as well. https://github.com/sccn/eeglab/blob/160d5957d79e12f788274e06f4a68fadada406b2/functions/sigprocfunc/runica.m#L849However, I do not think they are the same, given that they are different languages and may have different random generators. @DMRoberts what do you think about this?
My understanding is that MNE uses PCA as a way to do the whitening. @DMRoberts you also mentioned the PCA whitening. So, my this question is the PCA used in MNE is the same thing with the PCA (optional) used in the EEGLab? ---- I was thinking, if they are similar, I can try to make two processes similar. For example, I commented the part that divides the SD, and the last whitening step in MNE; I turned 'pca' on and 'sphere' off in EEGLab. And all sets random state to 0. Actually, I did test on this, and the result showed that only the first component looks similar.
The last update is two component plotting. One is from eeglab with the default setting (pop_runica(EEG, 'icatype', 'runica', 'extended', 1, 'rndreset','no')
):
Another is from MNE with some parameter settings what uses the same values from EEGLAB.
method = 'infomax'
max_iter = 'auto'
fit_params=dict(extended=True,block=64,anneal_deg=60.0,blowup=1000000000,blowup_fac=0.8,l_rate=1.5749e-04)
ica = mne.preprocessing.ICA(n_components=None,
method=method,
max_iter=512,
fit_params=fit_params,
random_state=0)
ica.fit(raw)
@georgebuzzell @DMRoberts Appreciate any comments!
Hey @yanbin-niu
I think you're right on the MNE steps, I'll have to look into the EEGLAB code again about the EEGLAB steps. I believe that whitening / sphering is a combination of de-correlation and normalization, so that step you identified after the MNE ICA with the comment 'whitening' is a normalization step. I believe whitening is also done at the PCA stage (the PCA function is called with an optional whitening parameter, which normalizes the magnitudes after the initial PCA).
I think getting the same sequence of random values across MATLAB and Python ICA implementations may be more complex than just setting the random seed on each to the same value. This StackOverflow post suggests that setting the twister algorithm in MATLAB can generate the same random sequence between MATLAB and Numpy: https://stackoverflow.com/questions/18486241/comparing-matlab-and-numpy-code-that-uses-random-number-generation
However I believe that even if the random seed is the same, the sequence of random values selected could diverge unless the sequence of ICA steps is fully identical between the two implementations.
I'm also curious how similar the IC topography looks between EEGLAB and MNE with just all the default settings / transformations, etc. set in each? Another option we might be able to do for testing is to generate some synthetic data with known IC topography and seeing how each function performs the decomposition.
Hey @DMRoberts , Thanks for the comments. The IC topography in eeglab with the default setting (pop_runica(EEG, 'icatype', 'runica', 'extended', 1, 'rndreset','no')
) was pasted in my previous comment.
I rerun the IC topography with the default setting
# ica
method = 'infomax'
max_iter = 'auto'
fit_params=dict(extended=True)
ica = mne.preprocessing.ICA(n_components=None,
method=method,
max_iter=max_iter,
fit_params=fit_params,
random_state=0)
ica.fit(raw)
I would say they look similar, especially the very first ones.
@DMRoberts Thanks for sending the link. I took a look at it and I tried. Still got different results, which is expected, since, as you said, unless they are using identical process. But I do not think the MNE and the EEGLAB are going through the same process and it's difficult to put them on the same page.
First, in order to implement the ADJUST, we need the activation matrix and mixing matrix.
Second, EEGLAB provides the activation matrix and mixing matrix directly:
EEG.icaact
andEEG.icawinv
respectively. One thing to note here: if we useEEG.icawinv
to multiplyEEG.icaact
(matrix multiplication), we get theEEG.data
. Similarly, If we useEEG.data
to multiply the unmixing matrix (EEG.icasphere*EEG.icaweights
), we get the activation matrix (EEG.icaact
).Third, we want to get the activation matrix and mixing matrix in MNE as well. It seems that MNE takes three steps while implementing the ICA : whitening -> pca -> ica. Therefore, we get a bunch of matrices after ica:
pre_whitener_, pca_components_, pca_mean_, mixing_matrix_, unmixing_matrix_
. For their descriptions, see this screenshot:Additionally, there are two functions:
ICA.get_components()
andICA.get_sources()
. For their descriptions, see this screenshot:Here, from my understanding,
ICA.get_sources()
function should give us the activation matrix. The problem is the mixing matrix: it seems thatICA.get_components()
should be the mixing matrix. However, not exactly, but fairly close. (since if I multiply the matrix of ICA.get_components() by the matrix of ICA.get_sources(), I can not get the raw data).After taking a deeper look at the source code, here is how MNE computes ICA.get_sources(): ((raw data / prewhitener) - pcamean) @ the inverse matrix of ICA.get_components(). Namely, multiplying the matrix of
ICA.get_components()
by the matrix ofICA.get_sources()
gives us the pre-whitened data. and we still need to +pca_mean_
, and then *pre_whitener_
, to get the raw data.Does all this make sense? Sorry that those concepts are new to me, and I probably did not make the things clear.
So, my question is:
ICA.get_sources()
function gives us the activation matrix, right?ICA.get_components()
function ? Or, should I do extra computation to get the mixing matrix (ICA.get_components()
+pca_mean_
, and then* pre_whitener_
)?Many thanks for comments! @DMRoberts @georgebuzzell