carmonalab / UCell

Gene set scoring for single-cell data
GNU General Public License v3.0
129 stars 15 forks source link

Warning: Error in GetAssayData: GetAssayData doesn't work for multiple layers in v5 assay. #31

Open hongjianjin opened 12 months ago

hongjianjin commented 12 months ago

Would you plan to make AddModuleScore_UCell function compatible with Seurat V5 in near future ? Thanks!

mass-a commented 11 months ago

Yes we definitely plan to support Seurat v5 objects. We will hopefully get around to doing this soon, will post here when it's available. -m

mass-a commented 10 months ago

Hello, the latest version of UCell (v2.6) available with Bioconductor 3.18 should be fully compatible with Seurat 5. Let us know if you still experience compatibility issues. Best -m

bepoli commented 9 months ago

Hi @mass-a, I still get an error with UCell 2.6.2 and Seurat 5.0.1:

obj <- AddModuleScore_UCell(obj, features = list('module_name' = my_genes))
Error in `GetAssayData()` at SeuratObject/R/seurat.R:1901:3:
! GetAssayData doesn't work for multiple layers in v5 assay.

It does work only after I join the layers: obj <- JoinLayers(obj), so as a workaround I'm joining and re-splitting the object as needed.

mass-a commented 9 months ago

Thanks @bepoli! I think running UCell on joined layers (or before you split them out) is the best approach for now; we'll work on a solution for objects split on multiple layers. Cheers

frac2738 commented 6 months ago

Thanks @bepoli! I think running UCell on joined layers (or before you split them out) is the best approach for now; we'll work on a solution for objects split on multiple layers. Cheers

Do you have a timeline for this feature to be implemented? I am working on a 1.2 Mil cell dataset and I cannot JoinLayers because of the usual memory limitations. Being able to run UCell on the splitted layers would be really useful.

Thanks for the amazing job you are doing.

mass-a commented 6 months ago

I *think* the latest version on GitHub (2.7.3) should take care of Seurat in multiple layers. Can you try to install from github (remotes::install_github("carmonalab/UCell") and see whether it works for you?

frac2738 commented 6 months ago

v2.7.3 doesn't work either. I also tried with the scale.data slot (which is not layered), but I get the usual matrix size error.

 # data slot (multiple layers)
 > exp_Integrated <- AddModuleScore_UCell(exp_Integrated,signatures_list,ncores = 2,name = "", slot = "data")
 Error in `GetAssayData()`:
 ! GetAssayData doesn't work for multiple layers in v5 assay.
 Run `rlang::last_trace()` to see where the error occurred.

 # counts slot (multiple layers)
 > exp_Integrated <- AddModuleScore_UCell(exp_Integrated,signatures_list,ncores = 2,name = "", slot = "counts")
 Error in `GetAssayData()`:
 ! GetAssayData doesn't work for multiple layers in v5 assay.
 Run `rlang::last_trace()` to see where the error occurred.

 # scale.data slot (single layer)
 > exp_Integrated <- AddModuleScore_UCell(exp_Integrated,signatures_list,ncores = 2,name = "", slot = "scale.data")
 Erreur dans .m2sparse(from, paste0(kind, "g", repr), NULL, NULL) : 
   attempt to construct sparseMatrix with more than 2^31-1 nonzero entries

The object has 160 samples for a total of 1188484 cells:

An object of class Seurat 
33359 features across 1188484 samples within 1 assay 
Active assay: RNA (33359 features, 3000 variable features)
 321 layers present: counts.C-AP, counts.C-IJ, counts.C-JC, counts.C-RE, counts.C1_Cousin_P1, counts.C11_bis, counts.C13, counts.C2_Brother_P4, counts.C20-59-01-C001, counts.C20-59-01-C002, counts.C20-59-01-C003, counts.C20-59-01-P005, counts.C20-59-01-P010, counts.C20-59-01-P016, counts.C20-59-01-P017, counts.C20-59-01-P018,     counts.C20-59-01-P020, counts.C20-59-01-P022, counts.C20-59-01-P023, counts.C20-59-01-P024, counts.C20-59-01-P025, counts.C20-59-01-P026, counts.C20-59-01-P027, counts.C20-59-01-P028, counts.C20-59-01-P029, counts.C20-59-01-P032, counts.C20-59-01-P033, counts.C20-59-01-P035, counts.C20-59-01-P036, counts.C20-59-01-P039, counts.C20-59-01-P042, counts.C20-59-01-P043, counts.C20-59-01-P044, counts.C20-59-01-P045, counts.C20-59-01-P046, counts.C20-59-01-P047, counts.C20-59-01-P048, counts.C20-59-01-P049, counts.C20-59-01-P050, counts.C20-59-01-P052, counts.C20-59-01-P053, counts.C20-59-01-P056, counts.C20-59-01-P058, counts.C20-59-01-P059, counts.C20-59-01-P062, counts.C20-59-01-P063, counts.C20-59-01-P064, counts.C20-59-01-P065, counts.C20-59-01-P066, counts.C20-59-01-P067, counts.C20-59-01-P068, counts.C20-59-01-P069, counts.C20-59-01-P071, counts.C20-59-01-P072, counts.C20-59-01-P073, counts.C20-59-01-P074, counts.C20-59-01-P075, counts.C20-59-01-P076, counts.C20-59-01-P077, counts.C20-59-01-P079, counts.C20-59-01-P080, counts.C20-59-01-P081, counts.C20-59-01-P082, counts.C20-59-01-P084, counts.C20-59-01-P085, counts.C20-59-01-P086, counts.C20-59-01-P087, counts.C20-59-01-P088, counts.C20-59-01-P090, counts.C20-59-01-P093, counts.C20-59-01-P094, counts.C20-59-01-P095, counts.C20-59-01-P096, counts.C20-59-01-P097, counts.C20-59-01-P101, counts.C20-59-01-P102, counts.C20-59-01-P103, counts.C20-59-01-P107, counts.C20-59-01-P109, counts.C20-59-01-P111, counts.C20-59-02-C001, counts.C20-59-02-P001, counts.C20-59-02-P002, counts.C20-59-04-C007, counts.C20-59-04-C010, counts.C20-59-04-C013, counts.C20-59-04-C014, counts.C20-59-04-C015, counts.C20-59-04-C016, counts.C20-59-04-C017, counts.C20-59-04-C018, counts.C20-59-04-C019, counts.C20-59-04-C020, counts.C20-59-04-C021, counts.C20-59-04-C022, counts.C20-59-04-C023, counts.C20-59-04-C024, counts.C20-59-04-C026, counts.C20-59-05-C001, counts.C20-59-05-C002, counts.C20-59-05-C004, counts.C20-59-05-C009, counts.C20-59-05-C010, counts.C20-59-05-C016, counts.C20-59-05-C020, counts.C20-59-07-C001, counts.C20-59-07-C002, counts.C20-59-07-C003, counts.C20-59-07-C005, counts.C20-59-07-C006, counts.C20-59-07-C010, counts.C20-59-07-C013, counts.C20-59-07-C018, counts.C20-59-07-C019, counts.C20-59-07-C020, counts.C20-59-07-C022, counts.C20-59-07-C025, counts.C20-59-07-C028, counts.C20-59-07-C029, counts.C20-59-07-C030, counts.C22, counts.C26, counts.C27, counts.C3_Mother_P6_CTLA4, counts.C4_bis, counts.C5, counts.C6, counts.C7, counts.Ced_MOU, counts.Ena_ERR, counts.Mae_JAW, counts.P1_CTLA4_ht_T, counts.P1_LRBA_hmz, counts.P1_NBEAL2_hmz, counts.P1_NRAS_ht_LS_JMML, counts.P2_CTLA4_ht_hc_Mother_P1, counts.P2_LRBA_hmz, counts.P2_NBEAL2_ht_hc_Mother_P1, counts.P3_CTLA4_ht_hc_Father_P6, counts.P3_KRAS_ht_leukemia_SJMML_S3, counts.P3_KRAS_ht_RALD_leukemia_S2, counts.P3_KRAS_ht_RALD_S1, counts.P3_NBEAL2_hmz, counts.P4_CTLA4_ht_T, counts.P4_KRAS_ht_SJMML, counts.P5_CTLA4_ht, counts.P5_CTLA4_ht_T, counts.P5_LRBA_compound, counts.P5_NBEAL2_compound, counts.P6_CTLA4_ht, counts.P6_CTLA4_ht_T, counts.P6_LRBA_hmz, counts.P6_NBEAL2_compound, counts.P7_CTLA4_hmz, counts.P7_LRBA_hmz, counts.P7_NBEAL2_hmz, counts.P8_CTLA4_ht_hc_mother_P7, counts.P9_CTLA4_ht, counts.P9_CTLA4_ht_T, data.C-AP, data.C-IJ, data.C-JC, data.C-RE, data.C1_Cousin_P1, data.C11_bis, data.C13, data.C2_Brother_P4, data.C20-59-01-C001, data.C20-59-01-C002, data.C20-59-01-C003, data.C20-59-01-P005, data.C20-59-01-P010, data.C20-59-01-P016, data.C20-59-01-P017, data.C20-59-01-P018, data.C20-59-01-P020, data.C20-59-01-P022, data.C20-59-01-P023, data.C20-59-01-P024, data.C20-59-01-P025, data.C20-59-01-P026, data.C20-59-01-P027, data.C20-59-01-P028, data.C20-59-01-P029, data.C20-59-01-P032, data.C20-59-01-P033, data.C20-59-01-P035, data.C20-59-01-P036, data.C20-59-01-P039, data.C20-59-01-P042, data.C20-59-01-P043, data.C20-59-01-P044, data.C20-59-01-P045, data.C20-59-01-P046, data.C20-59-01-P047, data.C20-59-01-P048, data.C20-59-01-P049, data.C20-59-01-P050, data.C20-59-01-P052, data.C20-59-01-P053, data.C20-59-01-P056, data.C20-59-01-P058, data.C20-59-01-P059, data.C20-59-01-P062, data.C20-59-01-P063, data.C20-59-01-P064, data.C20-59-01-P065, data.C20-59-01-P066, data.C20-59-01-P067, data.C20-59-01-P068, data.C20-59-01-P069, data.C20-59-01-P071, data.C20-59-01-P072, data.C20-59-01-P073, data.C20-59-01-P074, data.C20-59-01-P075, data.C20-59-01-P076, data.C20-59-01-P077, data.C20-59-01-P079, data.C20-59-01-P080, data.C20-59-01-P081, data.C20-59-01-P082, data.C20-59-01-P084, data.C20-59-01-P085, data.C20-59-01-P086, data.C20-59-01-P087, data.C20-59-01-P088, data.C20-59-01-P090, data.C20-59-01-P093, data.C20-59-01-P094, data.C20-59-01-P095, data.C20-59-01-P096, data.C20-59-01-P097, data.C20-59-01-P101, data.C20-59-01-P102, data.C20-59-01-P103, data.C20-59-01-P107, data.C20-59-01-P109, data.C20-59-01-P111, data.C20-59-02-C001, data.C20-59-02-P001, data.C20-59-02-P002, data.C20-59-04-C007, data.C20-59-04-C010, data.C20-59-04-C013, data.C20-59-04-C014, data.C20-59-04-C015, data.C20-59-04-C016, data.C20-59-04-C017, data.C20-59-04-C018, data.C20-59-04-C019, data.C20-59-04-C020, data.C20-59-04-C021, data.C20-59-04-C022, data.C20-59-04-C023, data.C20-59-04-C024, data.C20-59-04-C026, data.C20-59-05-C001, data.C20-59-05-C002, data.C20-59-05-C004, data.C20-59-05-C009, data.C20-59-05-C010, data.C20-59-05-C0, data.C20-59-05-C020, data.C20-59-07-C001, data.C20-59-07-C002, data.C20-59-07-C003, data.C20-59-07-C005, data.C20-59-07-C006, data.C20-59-07-C010, data.C20-59-07-C013, data.C20-59-07-C018, data.C20-59-07-C019, data.C20-59-07-C020, data.C20-59-07-C022, data.C20-59-07-C025, data.C20-59-07-C028, data.C20-59-07-C029, data.C20-59-07-C030, data.C22, data.C26, data.C27, data.C3_Mother_P6_CTLA4, data.C4_bis, data.C5, data.C6, data.C7, data.Ced_MOU, data.Ena_ERR, data.Mae_JAW, data.P1_CTLA4_ht_T, data.P1_LRBA_hmz, data.P1_NBEAL2_hmz, data.P1_NRAS_ht_LS_JMML, data.P2_CTLA4_ht_hc_Mother_P1, data.P2_LRBA_hmz, data.P2_NBEAL2_ht_hc_Mother_P1, data.P3_CTLA4_ht_hc_Father_P6, data.P3_KRAS_ht_leukemia_SJMML_S3, data.P3_KRAS_ht_RALD_leukemia_S2, data.P3_KRAS_ht_RALD_S1, data.P3_NBEAL2_hmz, data.P4_CTLA4_ht_T, data.P4_KRAS_ht_SJMML, data.P5_CTLA4_ht, data.P5_CTLA4_ht_T, data.P5_LRBA_compound, data.P5_NBEAL2_compound, data.P6_CTLA4_ht, data.P6_CTLA4_ht_T, data.P6_LRBA_hmz, data.P6_NBEAL2_compound, data.P7_CTLA4_hmz, data.P7_LRBA_hmz, data.P7_NBEAL2_hmz,     data.P8_CTLA4_ht_hc_mother_P7, data.P9_CTLA4_ht, data.P9_CTLA4_ht_T, scale.data
 3 dimensional reductions calculated: pca, red_rpca, umap_rpca
mass-a commented 6 months ago

This works for me with UCell 2.7.3:

library(UCell)
data(sample.matrix)
obj <- Seurat::CreateSeuratObject(sample.matrix)

obj$Tag <- "tag1"
obj$Tag[1:300] <- "tag2"

obj[["RNA"]] <- split(obj[["RNA"]], f = obj$Tag)
#obj now has two layers
obj
An object of class Seurat 
20729 features across 600 samples within 1 assay 
Active assay: RNA (20729 features, 0 variable features)
 2 layers present: counts.tag2, counts.tag1
gene.sets <- list(Tcell = c("CD2","CD3E","CD3D"),
                  Myeloid = c("SPI1","FCER1G","CSF1R"))
obj <- AddModuleScore_UCell(obj,features = gene.sets)

Can you confirm?

frac2738 commented 6 months ago

Your code was not working, but a full restart of the R session did the trick. Your code now works and it seems to be working also on my dataset. Thanks!

mpizzagalli777 commented 4 months ago

Hi, Thanks for this wonderful package. I was wondering if it is possible to run this analysis on integrated objects. I have been running an analysis on a Seurat object that has been integrated but when I run AddModuleScore_UCell

AddModuleScore_UCell(object, assay = "integrated", features = markers)

I get an error stating

Warning: Over half of genes (100%) in specified signatures are missing from data. ...

However, when I define the assay as SCT, the function works. Is this an issue with the naming convention used after integration?

AddModuleScore_UCell(object, assay = "SCT", features = markers)

Thanks so much for the help.

mass-a commented 4 months ago

Hello, if you integrated using Seurat, by default the "integrated" assay will only contain the variable genes. You can verify e.g. by dim(obj@assays$integrated@data). That is why UCell complains about missing genes (they are not present in the assay). You should be able to specify to the Seurat integration functions to generate corrected values for all genes. However, I would recommend calculating signature scores on the uncorrected assays (RNA or SCT). While batch effects can be large on the global transcriptome, you could expect them to have a small impact on the reduced gene sets used for signature scoring.

mpizzagalli777 commented 4 months ago

Thanks so much for the quick reply! That makes sense. I appreciate it.