Danko-Lab / TED

a fully Bayesian approach to deconvolve tumor microenvironment
60 stars 10 forks source link

scRNA reference matrix issues #26

Closed SBaek613 closed 2 years ago

SBaek613 commented 2 years ago

Hi, thanks for the great tool.

I have been trying this tool out for deconvolution of TCGA bulk-RNA dataset with scRNA reference matrix of my own.

I am mainly using Seurat for scRNA-seq data and as always, the main problem is that it's impossible to extract non-sparse matrix from Seurat due to limitation of matrix format (with maximum number of elements being ~2^31).

Originally, I have about 180,000 cells with 40000 features and I was able to filter features down to about ~20000 genes. I would like to keep the cell number as same as possible but that is probably impossible to do. As far as I know, this tool does not accept sparse matrix with dgCMatrix format or accept unloaded 'txt' files stored in the drive.

So I came up with a couple of 'work-around' to this situation but not sure which one would be the best fitting for this tool. 1) Randomly subset the data. Bringing it down to about 1/3 of original cell counts would give me non-sparse matrix without errors. 2) Make collapsed count matrix for each major cell type. I read from the tutorial that I can use collapsed count matrix, but I am a bit worried that doing that might impact the results. And if I were to use the collapsed matrix, would I just combine counts of each gene for each cell type?

I would appreciate any advice!

tinyi commented 2 years ago

Thank you for your interest in our work.

Yes. The second solutions is more recommended. By collapsing, simply add up the reads within each cell state(subtypes) to make a gene expression profile (use the argument input.type=“GEP”), and label each row of GEP as the corresponding cell state or cell type. By doing so, the result will be the same as using the raw count.

Best,

Tinyi

On Wed, Jun 8, 2022 at 10:11 PM Seungbyn @.***> wrote:

Hi, thanks for the great tool.

I have been trying this tool out for deconvolution of TCGA bulk-RNA dataset with scRNA reference matrix of my own.

I am mainly using Seurat for scRNA-seq data and as always, the main problem is that it's impossible to extract non-sparse matrix from Seurat due to limitation of matrix format (with maximum number of elements being ~2^31).

Originally, I have about 180,000 cells with 40000 features and I was able to filter features down to about ~20000 genes. I would like to keep the cell number as same as possible but that is probably impossible to do. As far as I know, this tool does not accept sparse matrix with dgCMatrix format or accept unloaded 'txt' files stored in the drive.

So I came up with a couple of 'work-around' to this situation but not sure which one would be the best fitting for this tool.

  1. Randomly subset the data. Bringing it down to about 1/3 of original cell counts would give me non-sparse matrix without errors.
  2. Make collapsed count matrix for each major cell type. I read from the tutorial that I can use collapsed count matrix, but I am a bit worried that doing that might impact the results. And if I were to use the collapsed matrix, would I just combine counts of each gene for each cell type?

I would appreciate any advice!

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/TED/issues/26, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHS5SCXHJL26CZ5YQXZTVOFHEJANCNFSM5YIODODA . You are receiving this because you are subscribed to this thread.Message ID: @.***>