sc.dat is single cell or bulk seq matrix?

trusha0911 commented 8 months ago

Hi!

Just wanted to quickly confirm if sc.dat is "The cell-by-gene raw count matrix of bulk RNA-seq expression. rownames are bulk cell IDs, while colnames are gene names/IDs." as mentioned in the tutorial or is it Single Cell raw count matrix? And if it is single cell matrix then is it alright to use merged data (from several patients) which underwent QC or completely unfiltered?

Many thanks!

tinyi commented 8 months ago

Hi

The file sc.dat represents the scRNA-seq count matrix. Please note that the term "bulk" was a typographical error. I've corrected this in the updated vignette. Thank you for bringing it to our attention.

For optimal results, it's essential to filter and perform quality control (QC) on the input count matrix, in line with standard procedures for processing scRNA-seq data. Depending on the cell type's heterogeneity across patients, you have two options:

If the cell type is of low heterogeneity, you can label each cell type while omitting the patient ID, similar to the approach used for endothelial, pericytes and oligodendrocytes in the tutorial.
Alternatively, when the cell type exhibits high heterogeneity you can categorize the cell from each patient / subcluster as a cell state, similar to the approach used for malignant cells and myeloid cells in the tutorial.

Best,

Tinyi

youcef-benmohammed commented 4 months ago

Hi The file sc.dat represents the scRNA-seq count matrix. Please note that the term "bulk" was a typographical error. I've corrected this in the updated vignette. Thank you for bringing it to our attention. For optimal results, it's essential to filter and perform quality control (QC) on the input count matrix, in line with standard procedures for processing scRNA-seq data. Depending on the cell type's heterogeneity across patients, you have two options: 1. If the cell type is of low heterogeneity, you can label each cell type while omitting the patient ID, similar to the approach used for endothelial, pericytes and oligodendrocytes in the tutorial. 2. Alternatively, when the cell type exhibits high heterogeneity you can categorize the cell from each patient / subcluster as a cell state, similar to the approach used for malignant cells and myeloid cells in the tutorial. Best, Tinyi

Hi

I've a question, I'm a little bit confused. In the tutorial, sc.dat has dimensions 23793 x 60294 and sc.bk has dimensions 169 x 60483.

Does this mean there are around 60K genes and are they unique?

Thank you, Youcef.

tinyi commented 4 months ago

only shared genes will be used for deconvolution.

On Thu, Mar 7, 2024 at 10:11 AM Youcef BEN MOHAMMED < @.***> wrote:

Hi The file sc.dat represents the scRNA-seq count matrix. Please note that the term "bulk" was a typographical error. I've corrected this in the updated vignette. Thank you for bringing it to our attention. For optimal results, it's essential to filter and perform quality control (QC) on the input count matrix, in line with standard procedures for processing scRNA-seq data. Depending on the cell type's heterogeneity across patients, you have two options: 1. If the cell type is of low heterogeneity, you can label each cell type while omitting the patient ID, similar to the approach used for endothelial, pericytes and oligodendrocytes in the tutorial. 2. Alternatively, when the cell type exhibits high heterogeneity you can categorize the cell from each patient / subcluster as a cell state, similar to the approach used for malignant cells and myeloid cells in the tutorial. Best, Tinyi

Hi

I've a question, I'm a little bit confused. In the tutorial, sc.dat has dimensions 23793 x 60294 and sc.bk has dimensions 169 x 60483.

Does this mean there are around 60K genes and are they unique?

Thank you, Youcef.

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/62#issuecomment-1983721171, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHSYVHO34WAUSCFVXHVTYXB7TLAVCNFSM6AAAAAA6RFZ2ZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBTG4ZDCMJXGE . You are receiving this because you commented.Message ID: @.***>

Danko-Lab / BayesPrism

sc.dat is single cell or bulk seq matrix? #62