duct317 / scISR

2 stars 2 forks source link

scISR transforms the raw count but does not perform any imputation on zeros #3

Open Rohit-Satyam opened 12 months ago

Rohit-Satyam commented 12 months ago

@duct317 I tried running scISR on my dataset as follows. Surprisingly, I do not see the number of genes expressed go up. They stay the same. I used the following code

library(scISR)

## Setting RNA as default assay
org<-lapply(sample.list,FUN = function(x){DefaultAssay(x)<-"RNA"; return(x)})

## Performing Imputation
org.scisr<-lapply(org,function(x){
  scISR(as.matrix(GetAssayData(x,assay="RNA", slot="count"), rownames=TRUE), ncores = 5,preprocessing = FALSE, seed=12345)
})

## Checking if all samples were imputed or some were left as is

lapply(1:length(org.scisr), function(x){
  table(colSums(org.scisr[[x]])==colSums(org[[x]]@assays$RNA@counts))
})
[[1]]

 TRUE 
10651 

[[2]]

FALSE  TRUE 
15349  1186 

[[3]]

FALSE  TRUE 
 9743   865 

[[4]]

FALSE  TRUE 
 9893  1086 

[[5]]

FALSE 
 6737 
scisr<-lapply(1:length(org.scisr),function(x){
  temp <- CreateAssayObject(count = as.sparse(org.scisr[[x]]))
org[[x]][["scisr"]] <- temp
DefaultAssay(org[[x]])<-"scisr"
return(org[[x]])
})

names(scisr)<- names(org)
## Replacing original mca because this is reference atlas
names(scisr)<- names(org)
l<- lapply(scisr, function(x){
 x<- NormalizeData(x, assay="scisr") 
 x<- FindVariableFeatures(x,selection.method="vst",assay="scisr")
})
l$mca<- org$mca

saveRDS(l,"scisr.rds")

As you can see

duct317 commented 12 months ago

scISR will first perform statistical test to determine if the data need to be imputed. It looks like scISR determines that the first dataset does not need to be imputed. There are changes in the other data.

Rohit-Satyam commented 12 months ago

@duct317. I understand that. There are changes in other samples but the number of genes expressed per cell stays the same. So I think imputation didn't work because had it worked we would have observed increase in gene expressed in some cells. I am saying so because when I create violin plots per sample to see Median genes expressed per cell, the distribution and the median stays the same. Of note, my count matrices are 99% zero and maybe because of that your method simply transforms the non-zero values and leaves zero values untouched.