BorgwardtLab / reComBat

reComBat package to correct batch effects
BSD 3-Clause "New" or "Revised" License
10 stars 2 forks source link

reComBat

License: BSD Version [PythonVersion]()

This is the reComBat implementation as described in our recent paper. The paper introduces a generalized version of the empirical Bayes batch correction method introduced in [1]. We use the two-design-matrix approach of Wachinger et al. [2]

Installation

reComBat is a PyPI package which can be installed via pip:

pip install reComBat

You can also clone the repository and install it locally via Poetry by executing

poetry install

in the repository directory.

Usage

The reComBat package is inspired by the code of [3] and also uses a scikit-learn like API.

In a Python script, you can import it via

from reComBat import reComBat

combat = reComBat()
combat.fit(data,batches)
combat.transform(data,batches)

or

combat.fit_transform(data,batches)

All data input (data, batches, design matrices) are input as pandas dataframes. The format is (rows x columns) = (samples x features), and the index is an arbitrary sample index. The batches should be given as a pandas series. Note that there are two types of columns for design matrices, numerical columns and categorical columns. All columns in X and C are by default assumed categorical. If a column contains numerical covariates, these columns should have the suffix "_numerical" in the column name.

There is also a command-line interface which can be called from a bash shell.

reComBat data_file.csv batch_file.csv --<optional args>

Arguments

The reComBat class has many optional arguments (see below). The fit, transform and fit_transform functions all take pandas dataframes as arguments, data and batches. Both dataframes should be in the form above.

Optional arguments

The reComBat class has the following optional arguments:

The command line interface can take any of these arguments (except for config) via --<argument>=ARG. Any scikit-learn keyword arguments should be given explicitly, e.g. --alpha=1e-10. The command line interface has the additional following optional arguments:

Output

The transform method and the command line interface output a dataframe, respectively a csv file, of the form (samples x features) with the adjusted data.

Tutorial

We included a step-by-step tutorial in the tutorial folder of the GitHub repository. We also provide a PDF version which serves as a manual.

Contact

This code is developed and maintained by members of the Machine Learning and Computational Biology Lab of Prof. Dr. Karsten Borgwardt:

References:

[1] W. Evan Johnson, Cheng Li, Ariel Rabinovic, Adjusting batch effects in microarray expression data using empirical Bayes methods, Biostatistics, Volume 8, Issue 1, January 2007, Pages 118–127, https://doi.org/10.1093/biostatistics/kxj037

[2] Christian Wachinger, Anna Rieckmann, Sebastian Pölsterl. Detect and Correct Bias in Multi-Site Neuroimaging Datasets. arXiv:2002.05049

[3] pycombat, CoAxLab, https://github.com/CoAxLab/pycombat