The goal of scLinear is to predict antibody derived tags (ADT) data from gene expression data in scRNA-seq data. It includes all the necessary pre-processing steps, comes equiped with pre-trained models and also allows the training of new models.
- [Installation](#Installation)
- [Example](#Example)
- [Other Functions](#Other-functions)
- [Citation](#Citation)
- [Contact us](#Contact-us)
## Installation
You can install the development version of scLinear using devtools.
Because of multiple issues with python dependencies not correctly
installed during package installation, I removed the automated
installation, of the dependencies for now and added them to the
installation guide (I will try to solve the issue in the future)
``` r
# Set up a python environment. I will use the renv package to create a new local environment. In case you want to use your default environment and r library, the first two commands can be scipped.
renv::init("./")
renv::use_python()
install.packages("devtools")
install.packages("reticulate")
# Get sure that the correct python environment is used by reticulate
reticulate::py_config()
#> python: /path/to/renv/python/virtualenvs/renv-python-3.10/bin/python
#> libpython: /usr/lib/python3.10/config-3.10-x86_64-linux-gnu/libpython3.10.so
#> pythonhome: /path/to/renv/python/virtualenvs/renv-python-3.10:/path/to/renv/python/virtualenvs/renv-python-3.10
#> version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
#> numpy: [NOT FOUND]
#>
#> NOTE: Python version was forced by RETICULATE_PYTHON
# Install all python dependencies (The version specified for scikit-learn & numpy should be yoused, if you want to use the pretrained models)
py_modules_list <- c("numpy<1.26.0","joblib","scikit-learn==1.2.0","anndata","warnings","scanpy")
for (m in py_modules_list){
if(!reticulate::py_module_available(m)){reticulate::py_install(m)}
}
if(!reticulate::py_module_available("pytorch_lightning")){reticulate::py_install("pytorch_lightning", pip = TRUE)} # install with pip
# Test if all python dependencies are available
py_modules_list_available <- c("numpy","joblib","sklearn","anndata","warnings","torch","scanpy","os","scipy","typing", "pytorch_lightning")
py_modules_list_not_available <- c()
for (m in py_modules_list_available){
if(!reticulate::py_module_available(m)){py_modules_list_not_available <- c(py_modules_list_not_available,m)}
}
if(is.null(py_modules_list_not_available)){ "All python modules are available"}else{
print(paste0("The following python modules are not available: ", paste0(py_modules_list_not_available, collapse = ", ")))
print("Try installing them with: reticulate::py_install(missing module name)")
}
#> [1] "All python modules are available"
# if the installation of scLinear fails, try restarting the R session, to reload the reticulate environment
# Install scLinear
if (!require("devtools", quietly = TRUE)){install.packages("devtools")}
devtools::install_github("DanHanh/scLinear")
```
### Get data
The PBMC data can be downloaded from the 10X Genomics website
(
#> [1] "Start low quality cell removal"
#> [1] "Start clustering data"
#> [1] "Number of used dimensions for clustering: 25"
#> [1] "Start cell type annotation"
#> Pre-defined cell type database panglaodb will be used.
#> Multi Resolution Annotation Started.
#> Level 1 annotation started.
#> Level 2 annotation started.
#> Level 3 annotation started.
#> Level 4 annotation started.
#> Uniform Resolution Annotation Started.
``` r
saveRDS(pbmc10k ,"./local/pbmc10k_prepared.rds")
```
### Use a pre-trained model
User may manually load pre-trained models (available models: all, bcell,
tcell, nkcell). If a pretrained model is used it is advided to use the
raw data slot from the RNA assay, and normalization = TRUE, to ensure
that the input data is normalized the same way as for the training data.
`adt_predict()` is then used to predict the ADT values with
parameters:
\* `pipe` Pretrained model. \* `gexp` Gene expression matrix. \* `slot`
Seurat slot to use. “counts” (default) \* `normalize` TRUE (default) or
FALSE. An example can be found below:
``` r
pbmc10k <- readRDS("./local/pbmc10k_prepared.rds")
pipe <- create_adt_predictor()
pipe <- load_pretrained_model(pipe, model = "all")
pbmc10k@assays["predicted_ADT"] <- adt_predict(pipe = pipe,
gexp = pbmc10k@assays[["RNA"]],
normalize = TRUE)
saveRDS(pbmc10k ,"./local/pbmc10k_predicted.rds")
```
## Train a new model
To train a new model the following commands need to be used.
`create_adt_predictor()` to initialize predictor.
`fit_predictor()` with parameters: \* `pipe` predictor initialized
above.
\* `gexp_train` gene expression matrix of training set (i.e. RNA assay
from Seurat object).
\* `adt_train` ADT matrix of training set.
\* `normalize_gex` Gene expression normalization. TRUE (default) or
FALSE. \* `normalize_adt` ADT normalization. TRUE (default) or FALSE.
Subsequently, the `evaluate_predictor` command can be used (same
parameters with `fit_predictor`) to return the RMSE, Pearson and
Spearman of the training process. An example of this process can be
found below.
``` r
pbmc10k <- readRDS("./local/pbmc10k_prepared.rds")
## Create a training and a test set
set.seed(42)
indx <- sample(1:length(colnames(pbmc10k)), size = length(colnames(pbmc10k)), replace = FALSE)
pbmc10k_train <- pbmc10k[,indx[1:5000]]
pbmc10k_test <- pbmc10k[,indx[5001:length(colnames(pbmc10k))]]
## create predictor
pipe <- create_adt_predictor()
## train predictor
pipe <- fit_predictor(pipe = pipe,
gexp_train = pbmc10k_train@assays[["RNA"]],
adt_train = pbmc10k_train@assays[["ADT"]],
normalize_gex = TRUE,
normalize_adt = TRUE)
## save the trained model
save_trained_model(pipe = pipe, file = "./local/trained_model.joblib")
#> NULL
# load the trained model
pipe <- create_adt_predictor()
pipe <- load_pretrained_model(pipe, file = "./local/trained_model.joblib")
## evaluate predictor
eval_res <- evaluate_predictor(pipe = pipe,
gexp_test = pbmc10k_test@assays[["RNA"]],
adt_test = pbmc10k_test@assays[["ADT"]],
normalize_gex = TRUE,
normalize_adt = TRUE)
#> RMSE: 0.35243118253148364
#> Pearson correlation: 0.9411908520410573
#> Spearman correlation: 0.8725401159661871
print(eval_res)
#> RMSE Pearson Spearman
#> 1 0.3524312 0.9411909 0.8725401
## add the predicted adt assay
pbmc10k_test[["predicted_ADT"]] <- adt_predict(pipe = pipe,
gexp = pbmc10k_test@assays[["RNA"]],
normalize = TRUE)
```
## Citation
Daniel Hanhart et al., “ScLinear predicts protein abundance at
single-cell resolution”
DOI: