WT215 / bayNorm

Normalization for single cell RNA-seq data
9 stars 3 forks source link

Work around for seurat object #1

Closed kvshams closed 4 years ago

kvshams commented 4 years ago

Thanks for the cool tool!.

Is there any work around on the Seurat object? Thanks, Shams

WT215 commented 4 years ago

This has been mentioned here: https://github.com/satijalab/seurat/issues/1029#issuecomment-534856129, will look into it.

How about

qq<-unname(at_GG$Bay_out)
rm(at_GG)
x.seurat <- CreateSeuratObject(counts =qq,assay =
 'bayNorm')

Or save at_GG$Bay_out as HDF5Matrix object. Then Read10X_h5 (still, not sure about this).

Another relevant discussion: https://github.com/cole-trapnell-lab/monocle-release/issues/138#issuecomment-452572783.

A final resort could be using scanpy in python, which is similar to Seurat in R.

WT215 commented 4 years ago

Basically, as far as I think, there are following ways:

  1. Filtering out more cells/genes.
  2. Saver the normalized data. Then restart R and load that data into Seurat. By doing so, we make sure that there is only one big data loaded in the environment (exclude the raw data).
  3. Using scanpy in python.
  4. spam package in R... still look into it https://stackoverflow.com/questions/24236426/how-to-get-a-big-sparse-matrix-in-r-231-1 .
kvshams commented 4 years ago

I agree that it would work in other packages such as scnapy. Even it works very well with bigmemory (verified) in R but unfortunately the Seurat is still at dgCMatrix based data structure. Thanks a lot for working on it. As we have reached now on a dead end, I am closing this issue

kvshams commented 4 years ago

@WT215 sorry to bug you again. How easy to return Arcsine normalized dgcMatrix as an output. That would solve the memory issue as it keep the zero value as zero.

WT215 commented 4 years ago

@WT215 sorry to bug you again. How easy to return Arcsine normalized dgcMatrix as an output. That would solve the memory issue as it keep the zero value as zero.

Thanks for the suggestion! This idea sounds interesting. However the normalized data is no longer sparse, and the R itself cannot handle big matrix. So I am not very sure whether that idea works, but I will have a look.