Jfortin1 / ComBatHarmonization

Harmonization of multi-site imaging data with ComBat
268 stars 107 forks source link

Applying ComBat on single feature #14

Closed MStarmans91 closed 4 years ago

MStarmans91 commented 4 years ago

I would like to use ComBat on a single feature. From the formulation and the fact that this is done in the cortical thickness paper referred to in the readme, I would think it should be possible. However, the parametric formulation gives an error, while the non-parametric version results in only NaNs:

data = randn(1,10);
batch = [1 1 1 1 1 2 2 2 2 2];
mod = [1 2 1 2 1 2 1 2 1 2]';
combat(data, batch, mod, 0)

Results in:

[combat] Found 2 batches [combat] Adjusting for 1 covariate(s) of covariate level(s) [combat] Standardizing Data across features [combat] Fitting L/S model and finding priors [combat] Finding non-parametric adjustments [combat] Adjusting the Data ans =

NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

combat(data, batch, mod, 1)

Results in:

[combat] Found 2 batches [combat] Adjusting for 1 covariate(s) of covariate level(s) [combat] Standardizing Data across features [combat] Fitting L/S model and finding priors [combat] Finding parametric adjustments Index exceeds matrix dimensions.

Error in combat (line 93) temp = itSol(s_data(:,indices),gamma_hat(i,:),delta_hat(i,:),gamma_bar(i),t2(i),a_prior(i),b_prior(i), 0.001);

The error seems to be in the second iteration of the for loop line 93 is in. The index exceeding is in gamma_bar and t2.

Is this a bug in the code, or a limitation of the method I'm unaware of? Thanks in advance.

FYI: I'm using the MATLAB version in MATLAB 2015b.

Jfortin1 commented 4 years ago

Hi @MStarmans91,

ComBat is designed to harmonize datasets with multiple features (it uses Empirical Bayes to get improved estimates of location and scale parameters for the scanner effects). This won't work on a single feature.

You can try to use ComBat in R, with eb=FALSE, which will center and scale an individual feature for each batch separately.

MStarmans91 commented 4 years ago

Hi @Jfortin1 ,

Thank you for the reply. I had overlooked that specific option in the R code, which indeed seems to work when only providing a single feature.

I'm not sure how to exactly interpret your answer however: so you can actually use ComBat for single features, as this option does seem provided in R when not using emperical Bayes? Or as this does not use empirical Bayes, you would not consider this the actual ComBat harmonization? I actually have multiple features, but since they do not have the same kind of variations accross centers, the harmonization on the entire set did not give valid results, which is why I thought applying ComBat per feature would be a good idea.

In general, it would be nice if all the implementations would have the same functionality, as this seems to be a feature of the R code only, but I'm sure you were already on that track.

Jfortin1 commented 4 years ago

Since it's not using Empirical Bayes (EB), it is not considered ComBat, but rather a location and scale (L/S) adjustments model. ComBat learns about the L/S adjustment estimates by pooling information cross features using EB, therefore if you feel that your features don't share the same kind of variations across centers, using eb=FALSE should be preferable.

Agree re. having same functionalities across platforms. I just started refactoring the Python code and my goal is to add both the eb and parametric arguments sometime soon.

Jfortin1 commented 4 years ago

FYI, @MStarmans91 eb=False and parametric=False are now both implemented in the latest commit https://github.com/Jfortin1/ComBatHarmonization/commit/5f9a29698c23711434c747708ead0fd942e9a294

I compared data normalized with both R and Python with the different combinations of arguments using an example dataset, and results are identical.

Melissa1909 commented 3 years ago

Hey @Jfortin1, I don't really get how to implement ComBat on several features in parallel. I work on Freesurfer 7.1.1. output (cortical thickness, local gyrification index, surface area and GM volume), but they (of course) differ a lot in scale. So would you put all those values together into one input data file to use ComBat on all features simultaneously?

I have tried the following in Python 3.7:

        data_combat = neuroCombat(dat=area_lh,
                                  covars=covars,
                                  batch_col=batch_col,
                                  categorical_cols=discrete_cols,
                                  continuous_cols=continuous_cols,
                                  eb=True,
                                  parametric=True,
                                  mean_only=False,
                                  ref_batch=False)

and get negative values or values deviating a lot from the original ones.