corceslab / CHOIR

CHOIR : Clustering Hierarchy Optimization by Iterative Random forests (www.CHOIRclustering.com)
MIT License
20 stars 5 forks source link

SCTransform normalization error #11

Closed Carolina-Toste closed 6 months ago

Carolina-Toste commented 6 months ago

Hi there,

I had a question regarding using SCTransform with CHOIR. If I choose the SCT assay for use_assay= I get a warning:

For best performance of CHOIR with SCTransform normalization, please provide the unnormalized count matrix and set the 'normalization_method' parameter to 'SCTransform' .

I can see that when selecting the normalization method to SCTransform I get vst.flavor = "v2" by default. However I can't seem to be able to regress out unwanted variation (mito %, cell cycle, etc...). I was wondering what I should do in the case I wanted to regress out these variables and still use CHOIR.

Also I tried running it as is and got the following error

 choir_seurat <- CHOIR(seurat, use_assay= "RNA", n_cores = 1, normalization_method = "SCTransform")
----------------------------------------
- CHOIR - Part 1: Build clustering tree
----------------------------------------
Warning: Key 'CHOIR' already exists in provided object. Existing data may be overwritten.
Input data:
 - Object type: Seurat (v4)
 - # of cells: 848
 - # of modalities: 1

Proceeding with the following parameters:
 - Intermediate data stored under key: CHOIR
 - Normalization method: SCTransform
 - Subtree dimensionality reductions: TRUE
 - Dimensionality reduction method: Default
 - # of variable features: Default
 - Batch correction method: none
 - Maximum clusters: auto
 - Minimum cluster depth: 2000
 - Distance approximation: TRUE
 - Alpha: 0.05
 - Multiple comparison adjustment: bonferroni
 - Features to train RF: var
 - # of excluded features: 0
 - # of permutations: 100
 - # of RF trees: 50
 - Use variance: TRUE
 - Minimum accuracy: 0.5
 - Minimum connections: 1
 - Maximum repeated errors: 20
 - Maximum cells sampled: Inf
 - Downsampling rate: 1
 - # of cores: 1
 - Random seed: 1

2024-02-14 21:18:10 : (Step 1/6) Running initial dimensionality reduction..
                      Preparing matrix using 'RNA' assay and 'data' slot..
                      Running PCA with 2000 variable features..
Warning: Key 'PC_' taken, using 'choirp0reduction_' instead2024-02-14 21:20:26 : (Step 2/6) Generating initial nearest neighbors graph..
2024-02-14 21:20:27 : (Step 3/6) Identify starting clustering resolution..
                      [[ Current tree: 4 iterations in  1s ]] 
                      Starting resolution: 0.1
2024-02-14 21:20:28 : (Step 4/6) Building parent clustering tree..
                      [[ Current tree: 13 iterations in  2s ]] 

                      Identified 3 clusters in parent tree.
2024-02-14 21:20:30 : (Step 5/6) Subclustering parent tree..
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'as.matrix': subscript out of bounds

Here is the Traceback

Traceback

catpetersen commented 6 months ago

Hi! All of the benchmarking we've done for CHOIR for RNA data has been using log normalization, rather than SCTransform, so I can't vouch for how the performance may change using SCTransform. We don't currently have plans to add customizable parameters for SCTransform normalization, until we've verified how the normalization method impacts performance.

That said, if you're worried about mitochondrial read percentage and cell cycle impacting your clusters, I'd recommend excluding those features using CHOIR parameter exclude_features. Doing so will prevent those features from being used as input to the random forest classifiers, so they won't impact the pruning of the clustering tree and the final clusters.

For the error you encountered—first, can you try setting use_slot to "counts"? When using SCTransform normalization, we want to use the "counts" slot/layer as input, rather than the (default) "data" slot/layer.

And try setting the Seurat object default assay to "RNA" before running CHOIR, if you haven't already. I fixed a bug related to this recently in the 'dev' branch, and it may be something similar that you've run into.

DefaultAssay(seurat) <- "RNA"

If neither of those things help, could you try running this quick reproducible example? This runs without errors for me.

library(Seurat)
library(CHOIR)
library(scRNAseq)

data_matrix <- LaMannoBrainData('mouse-adult')
colnames(data_matrix@assays@data$counts) <- colnames(data_matrix)
seurat_object <- CreateSeuratObject(data_matrix@assays@data$counts)
seurat_object <- CHOIR(seurat_object, use_slot = "counts", normalization_method = "SCTransform")
catpetersen commented 6 months ago

Hi! I'm closing this issue for now, but please feel free to reply here if you have any additional issues and I'll reopen it.