Fit ComBat on training set and apply fitted ComBat on test set

anlijuncn commented 4 years ago

Hi, I am wondering whether we could Fit ComBat on training set and apply fitted ComBat on test set? If code supports, how could I save fitted parameters of ComBat

Thanks for your help!

Jfortin1 commented 4 years ago

ComBat was not designed for this purpose, as there is a number of scenarios which could be problematic:

The testing dataset contains a scanner/batch that is not present in the training dataset.
If adjusting for biological covariates in the training dataset, there is a chance that the covariate levels (or range in case of continuous covariates) are not well represented in the test dataset.

Extending and making Combat robust to such cases requires substantial work. Currently, ComBat is limited in harmonizing observational studies, and does not attempt to predict scanner effects on unseen scans.

Shotgunosine commented 4 years ago

I've tried this before and found that combat essentially overfits to the training set and when applied to the test set actually induces scanner differences that were not there. I trained a multinomial logistic regression on scanner in the training set, then ran combat on the training set, applied the weights to the test set and evaluate the classifiers performance in identifying scanner after having theoretically removed the scanner related variance with combat. My classifier ended up getting performance significantly below chance. My interpretation was that combat was creating features in the test set that were anticorrelated with scanner in the training set, resulting in below chance performance. @Jfortin1 this was a while ago, so my explanation is handwavy, but does that sound like something that could happen.

The solution I ended up with was to just apply combat separately to the training and test set, which is what I did here: https://www.biorxiv.org/content/10.1101/309260v1

If you want to try it for yourself, here's the monkey patched R-code I used to output weights from a training set: https://github.com/nih-fmrif/nielson_abcd_2018/blob/9ad719bcdcacdd3b4580d6f1a12398138b6a3c0c/swarm_dir/run_abcd_perm_new_draws.py#L35-L305

It be great to find out if someone else got this result as well.

anlijuncn commented 4 years ago

Thanks for your explanation @Jfortin1 and @Shotgunosine . I am trying to develop new harmonization model, for comparision purpose, I would like to split train/test. @Shotgunosine your solution seems helpful!

raamana commented 4 years ago

Hi @anlijuncn , you may find my confounds library and this example useful in achieving your objective: https://raamana.github.io/confounds/usage.html

Happy to chat with you if you think that would help.

Jfortin1 / ComBatHarmonization

Fit ComBat on training set and apply fitted ComBat on test set #13