Danko-Lab / BayesPrism

A Fully Bayesian Inference of Tumor Microenvironment composition and gene expression
132 stars 43 forks source link

Principle of separation of types? #65

Open Lao-Tz opened 8 months ago

Lao-Tz commented 8 months ago

Hello, I'm currently using BayesPrism for deconvolution and I have a question.

I'm working with single-cell sequencing data, which includes an equal amount of tumor cells and normal (non-tumor) cells. The bulk data also contains both tumor and normal cells. Suppose I've annotated 30 state subgroups, including CD8+, Plasma cells, etc., and then merged them into 8 type subgroups according to the cell types, such as Lymphocytes, Stromal cells, etc. However, I found that 10 of the state subgroups are only expressed in Tumor, and 5 state subgroups are only expressed in Normal. When viewing these 10 and 5 subgroups from the type dimension, some belong to the same type, such as Lymphocytes, while others do not.

I performed deconvolution in two ways: 1. Merge type subgroups accurately according to state. 2. Mark the type of state subgroups that are only expressed in tumor or normal as Tumor or Normal.

The single-cell data used in the BayesPrism paper did not include normal cells. After reading the BayesPrism paper, I started to dislike the method of CIBERSORT. However, my knowledge is limited and I currently do not have the ability to understand the underlying logic of BayesPrism. I'm not sure whether my analysis design is feasible, so I would like to ask for your opinion.

Both methods of analysis contain some collinearity (probably because there is redundancy in my cell subgroup division). I'm inclined to make the second method interpretable so that I can have a broader subsequent analysis.

By the way, the result of the first method is similar to CIBERSORT, but the second method is quite different

tinyi commented 7 months ago

Hi. Sorry for the late reply. I am not sure I am quite following. Could you elaborate a bit on the relationship between cell states and tumot/normal state? For example how may one cell state be found to exist in both tumor and normal samples? It is also unclear to me how you were trying to construct the reference. Were you trying to construct reference using scRNA datasets from both normal and tumor samples?

On Wed, Nov 8, 2023 at 5:10 PM Lao-Tz @.***> wrote:

Hello, I'm currently using BayesPrism for deconvolution and I have a question.

I'm working with single-cell sequencing data, which includes an equal amount of tumor cells and normal (non-tumor) cells. The bulk data also contains both tumor and normal cells. Suppose I've annotated 30 state subgroups, including CD8+, Plasma cells, etc., and then merged them into 8 type subgroups according to the cell types, such as Lymphocytes, Stromal cells, etc. However, I found that 10 of the state subgroups are only expressed in Tumor, and 5 state subgroups are only expressed in Normal. When viewing these 10 and 5 subgroups from the type dimension, some belong to the same type, such as Lymphocytes, while others do not.

I performed deconvolution in two ways: 1. Merge type subgroups accurately according to state. 2. Mark the type of state subgroups that are only expressed in tumor or normal as Tumor or Normal.

The single-cell data used in the BayesPrism paper did not include normal cells. After reading the BayesPrism paper, I started to dislike the method of CIBERSORT. However, my knowledge is limited and I currently do not have the ability to understand the underlying logic of BayesPrism. I'm not sure whether my analysis design is feasible, so I would like to ask for your opinion.

Both methods of analysis contain some collinearity (probably because there is redundancy in my cell subgroup division). I'm inclined to make the second method interpretable so that I can have a broader subsequent analysis.

— Reply to this email directly, view it on GitHub https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FDanko-Lab%2FBayesPrism%2Fissues%2F65&data=05%7C01%7Ctc532%40g.cornell.edu%7C2318bdf0cb6847d05ac408dbe03a8ead%7C5d7e43661b9b45cf8e79b14b27df46e1%7C0%7C0%7C638350314373071193%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=QSkR3gIpQqw5p92H9rrlqERwscw9jzXAX90a3SuowVc%3D&reserved=0, or unsubscribe https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB4NHSYO2ZYPEKIZ2QFOEYDYDNEALAVCNFSM6AAAAAA7CQUYIOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE4DGMJQGU3TGNQ&data=05%7C01%7Ctc532%40g.cornell.edu%7C2318bdf0cb6847d05ac408dbe03a8ead%7C5d7e43661b9b45cf8e79b14b27df46e1%7C0%7C0%7C638350314373071193%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qEerJRpnDrAkcgBr5YYnECWHYaBWD3DcOxzbw0yKmQ4%3D&reserved=0 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Lao-Tz commented 7 months ago

Hi. Sorry for the late reply. I am not sure I am quite following. Could you elaborate a bit on the relationship between cell states and tumot/normal state? For example how may one cell state be found to exist in both tumor and normal samples? It is also unclear to me how you were trying to construct the reference. Were you trying to construct reference using scRNA datasets from both normal and tumor samples? On Wed, Nov 8, 2023 at 5:10 PM Lao-Tz @.> wrote: Hello, I'm currently using BayesPrism for deconvolution and I have a question. I'm working with single-cell sequencing data, which includes an equal amount of tumor cells and normal (non-tumor) cells. The bulk data also contains both tumor and normal cells. Suppose I've annotated 30 state subgroups, including CD8+, Plasma cells, etc., and then merged them into 8 type subgroups according to the cell types, such as Lymphocytes, Stromal cells, etc. However, I found that 10 of the state subgroups are only expressed in Tumor, and 5 state subgroups are only expressed in Normal. When viewing these 10 and 5 subgroups from the type dimension, some belong to the same type, such as Lymphocytes, while others do not. I performed deconvolution in two ways: 1. Merge type subgroups accurately according to state. 2. Mark the type of state subgroups that are only expressed in tumor or normal as Tumor or Normal. The single-cell data used in the BayesPrism paper did not include normal cells. After reading the BayesPrism paper, I started to dislike the method of CIBERSORT. However, my knowledge is limited and I currently do not have the ability to understand the underlying logic of BayesPrism. I'm not sure whether my analysis design is feasible, so I would like to ask for your opinion. Both methods of analysis contain some collinearity (probably because there is redundancy in my cell subgroup division). I'm inclined to make the second method interpretable so that I can have a broader subsequent analysis. — Reply to this email directly, view it on GitHub https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FDanko-Lab%2FBayesPrism%2Fissues%2F65&data=05%7C01%7Ctc532%40g.cornell.edu%7C2318bdf0cb6847d05ac408dbe03a8ead%7C5d7e43661b9b45cf8e79b14b27df46e1%7C0%7C0%7C638350314373071193%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=QSkR3gIpQqw5p92H9rrlqERwscw9jzXAX90a3SuowVc%3D&reserved=0, or unsubscribe https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB4NHSYO2ZYPEKIZ2QFOEYDYDNEALAVCNFSM6AAAAAA7CQUYIOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE4DGMJQGU3TGNQ&data=05%7C01%7Ctc532%40g.cornell.edu%7C2318bdf0cb6847d05ac408dbe03a8ead%7C5d7e43661b9b45cf8e79b14b27df46e1%7C0%7C0%7C638350314373071193%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qEerJRpnDrAkcgBr5YYnECWHYaBWD3DcOxzbw0yKmQ4%3D&reserved=0 . You are receiving this because you are subscribed to this thread.Message ID: @.>

Thanks for your reply! My input data consists of:

I utilized the LIGER package for semi-supervised data dimensionality reduction and the Seurat package's FindClusters function for clustering. This resulted in the identification of over 30 subclusters. Upon examining the composition of these subclusters in terms of Tumor and Normal, I discovered that more than half of the subclusters were exclusively present in either Tumor or Normal. Consequently, I merged the subclusters exclusive to Tumor or Normal into two types, despite the possibility of dissimilar expression profiles between the subclusters distributed in Normal or Tumor. I set the key as 'Tumor'.

My current approach involves conducting two rounds of BayesPrism analysis. In the first round, I include both Tumor and Normal in the type definition. After deconvolution, I analyze whether the theta values of the types show significant differences between cancer and adjacent tissue in the bulk data. Upon identifying significant differences, I proceed with the second round of deconvolution, using only the subclusters from Tumor and Normal. However, I set their types based on the original cell types. I then analyze the theta values of the type results and perform single-factor Cox survival analysis to select major subclusters associated with survival for further analysis.

tinyi commented 7 months ago

Do you mind if sending me a table of cell.type.labels and cell.state.labels (if cell.state.labels differ from cell.type.labels) using something like table(data.frame(cell.type.labels, cell.state.labels)), for both the first round and second round of deconvolution? Thanks.

On Wed, Nov 22, 2023 at 5:18 PM Lao-Tz @.***> wrote:

Hi. Sorry for the late reply. I am not sure I am quite following. Could you elaborate a bit on the relationship between cell states and tumot/normal state? For example how may one cell state be found to exist in both tumor and normal samples? It is also unclear to me how you were trying to construct the reference. Were you trying to construct reference using scRNA datasets from both normal and tumor samples? … <#m-4686331678703494017> On Wed, Nov 8, 2023 at 5:10 PM Lao-Tz @.> wrote: Hello, I'm currently using BayesPrism for deconvolution and I have a question. I'm working with single-cell sequencing data, which includes an equal amount of tumor cells and normal (non-tumor) cells. The bulk data also contains both tumor and normal cells. Suppose I've annotated 30 state subgroups, including CD8+, Plasma cells, etc., and then merged them into 8 type subgroups according to the cell types, such as Lymphocytes, Stromal cells, etc. However, I found that 10 of the state subgroups are only expressed in Tumor, and 5 state subgroups are only expressed in Normal. When viewing these 10 and 5 subgroups from the type dimension, some belong to the same type, such as Lymphocytes, while others do not. I performed deconvolution in two ways: 1. Merge type subgroups accurately according to state. 2. Mark the type of state subgroups that are only expressed in tumor or normal as Tumor or Normal. The single-cell data used in the BayesPrism paper did not include normal cells. After reading the BayesPrism paper, I started to dislike the method of CIBERSORT. However, my knowledge is limited and I currently do not have the ability to understand the underlying logic of BayesPrism. I'm not sure whether my analysis design is feasible, so I would like to ask for your opinion. Both methods of analysis contain some collinearity (probably because there is redundancy in my cell subgroup division). I'm inclined to make the second method interpretable so that I can have a broader subsequent analysis. — Reply to this email directly, view it on GitHub https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FDanko-Lab%2FBayesPrism%2Fissues%2F65&data=05%7C01%7Ctc532%40g.cornell.edu%7C2318bdf0cb6847d05ac408dbe03a8ead%7C5d7e43661b9b45cf8e79b14b27df46e1%7C0%7C0%7C638350314373071193%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=QSkR3gIpQqw5p92H9rrlqERwscw9jzXAX90a3SuowVc%3D&reserved=0 https://github.com/Danko-Lab/BayesPrism/issues/65, or unsubscribe https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB4NHSYO2ZYPEKIZ2QFOEYDYDNEALAVCNFSM6AAAAAA7CQUYIOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE4DGMJQGU3TGNQ&data=05%7C01%7Ctc532%40g.cornell.edu%7C2318bdf0cb6847d05ac408dbe03a8ead%7C5d7e43661b9b45cf8e79b14b27df46e1%7C0%7C0%7C638350314373071193%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qEerJRpnDrAkcgBr5YYnECWHYaBWD3DcOxzbw0yKmQ4%3D&reserved=0 https://github.com/notifications/unsubscribe-auth/AB4NHSYO2ZYPEKIZ2QFOEYDYDNEALAVCNFSM6AAAAAA7CQUYIOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE4DGMJQGU3TGNQ . You are receiving this because you are subscribed to this thread.Message ID: @.>

Thanks for your reply! My input data consists of:

  • Single-cell RNA sequencing data: 40 samples, including 30,000 normal cells and 100,000 cancer cells.
  • Bulk RNA sequencing data: Obtained from TCGA, including 350+ cancer samples and 40+ normal samples.

I utilized the LIGER package for semi-supervised data dimensionality reduction and the Seurat package's FindClusters function for clustering. This resulted in the identification of over 30 subclusters. Upon examining the composition of these subclusters in terms of Tumor and Normal, I discovered that more than half of the subclusters were exclusively present in either Tumor or Normal. Consequently, I merged the subclusters exclusive to Tumor or Normal into two types, despite the possibility of dissimilar expression profiles between the subclusters distributed in Normal or Tumor. I set the key as 'Tumor'.

My current approach involves conducting two rounds of BayesPrism analysis. In the first round, I include both Tumor and Normal in the type definition. After deconvolution, I analyze whether the theta values of the types show significant differences between cancer and adjacent tissue in the bulk data. Upon identifying significant differences, I proceed with the second round of deconvolution, using only the subclusters from Tumor and Normal. However, I set their types based on the original cell types. I then analyze the theta values of the type results and perform single-factor Cox survival analysis to select major subclusters associated with survival for further analysis.

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/65#issuecomment-1822384664, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHSYRIM27APYLXAQGDZDYFW7MTAVCNFSM6AAAAAA7CQUYIOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRSGM4DINRWGQ . You are receiving this because you commented.Message ID: @.***>

Lao-Tz commented 7 months ago
# Extracting the 'minor_cluster' and 'group' columns
minor_cluster <- sce@meta.data$minor_cluster
group <- sce@meta.data$group

# Creating a table that lists the count of 'minor_cluster' in each group
cluster_table <- table(minor_cluster, group)

# Finding the 'minor_cluster' with a count of 0 in the 'Normal' and 'Tumor' groups
tumor <- row.names(cluster_table)[cluster_table[, "Normal"] == 0]
normal <- row.names(cluster_table)[cluster_table[, "Tumor"] == 0]

# Setting the corresponding 'major_cluster' and 'minor_cluster' of these clusters as "Tumor Cells" and "Normal Cells"
sce@meta.data$major_cluster[sce@meta.data$minor_cluster %in% tumor] <- "Tumor Cells"
#sce@meta.data$major_cluster[sce@meta.data$minor_cluster %in% normal] <- "Normal Cells"

This code does not incorporate Normal Cells, because this code was intercepted in my current working environment. It will be run when BayesPrism is run, so the major_cluster of the following data does not contain Normal Cells.

# first round
> table(sce$minor_cluster,sce$group)

      Normal Tumor
  EE1    983     0
  EG1   1705  2248
  EG2    601  1524
  EG3    683    13
  EV1   2413     0
  EV2      0  1183
  EV3      0    31
  GC1   1381  2216
  LB1   3689    97
  LB2      0  2901
  LB3   1330     0
  LB4      0  1184
  LB5      0  2788
  LB6    243     0
  LB7      0   135
  LT1   4011  1501
  LT2   2118  2585
  LT3      0  3067
  LT4      0  1230
  LT5    139   603
  LT6    242     0
  LT7    150     0
  LT8      0   118
  MM1   1471   358
  MM2      0   973
  MM3      0   544
  MN1   1050   473
  MY1    558   443
  NN1   1346     0
  SC1    807     0
  SC2      0   307
  SF1   3234     0
  SM1      0  1110
  TT1      0   535

> table(sce$major_cluster,sce$group)

                    Normal Tumor
  Endocrine Cells     1346     0
  Endothelial Cells   2413     0
  Epithelial Cells    5353  6001
  Lymphocytes        11922  4786
  Myeloid Cells       2521   831
  Stromal Cells       4599   443
  Tumor Cells            0 16106

> table(sce$major_cluster,sce$minor_cluster)

                     EE1  EG1  EG2  EG3  EV1  EV2  EV3  GC1  LB1  LB2  LB3  LB4  LB5  LB6  LB7  LT1  LT2  LT3  LT4  LT5  LT6  LT7  LT8  MM1  MM2  MM3  MN1  MY1  NN1  SC1  SC2  SF1  SM1  TT1
  Endocrine Cells      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 1346    0    0    0    0    0
  Endothelial Cells    0    0    0    0 2413    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
  Epithelial Cells   983 3953 2125  696    0    0    0 3597    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
  Lymphocytes          0    0    0    0    0    0    0    0 3786    0 1330    0    0  243    0 5512 4703    0    0  742  242  150    0    0    0    0    0    0    0    0    0    0    0    0
  Myeloid Cells        0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 1829    0    0 1523    0    0    0    0    0    0    0
  Stromal Cells        0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 1001    0  807    0 3234    0    0
  Tumor Cells          0    0    0    0    0 1183   31    0    0 2901    0 1184 2788    0  135    0    0 3067 1230    0    0    0  118    0  973  544    0    0    0    0  307    0 1110  535
#second round  (Another Rscript)
Idents(sce) <- "minor_cluster"
NT_keep = table(sce$minor_cluster,sce$group) %>% as.data.frame() %>% filter(Freq == 0) %>% select(Var1)
sce <- subset(sce, idents = NT_keep$Var1)

My Tumor subgroup was sampled by layers, and then merged manually according to the number of cells. My subgroup annotation is based on the first 50 genes of the FindAllMarkers function in seurat package, and some of them may be able to see what cell type it is just by looking at the Top 10 or even the Top 5 genes. I am a novice in the analysis of single cell sequencing data, and I have always wondered why everyone can annotate tumor cells when it is clear that they are all expression states of tumor microenvironment cells. Thanks!

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

major_cluster | minor_cluster1 | minor_cluster2 -- | -- | -- Lymphocytes | T cells | LT1 Lymphocytes | T cells | LT2 Lymphocytes | B cells | LB1 Epithelial Cells | Gastric Endocrine Cells | EG1 Lymphocytes | T cells | LT3 Epithelial Cells | Gastric Endocrine Cells | EG2 Stromal Cells | Fibroblasts | SF1 Endothelial Cells | Vascular Endothelial Cells | EV1 Myeloid Cells | Macrophages | MM1 Lymphocytes | B cells | LB2 Myeloid Cells | Neutrophils | MN1 Lymphocytes | B cells | LB3 Epithelial Cells | Gastric Chief Cells | GC1 Lymphocytes | B cells | LB4 Lymphocytes | B cells | LB5 Stromal Cells | Mast Cells | SM1 Lymphocytes | T cells | LT4 Myeloid Cells | Macrophages | MM2 Stromal Cells | Myofibroblasts | MY1 Stromal Cells | Cancer-associated fibroblasts (CAFs) | SC1 Lymphocytes | T cells | LT5 Epithelial Cells | Epithelial Cells | EE1 Endocrine Cells | Neuroendocrine Cells | NN1 Lymphocytes | T cells | LT6 Lymphocytes | B cells | LB6 Lymphocytes | T cells | LT7 Epithelial Cells | Gastric Endocrine Cells | EG3 Endothelial Cells | Vascular Endothelial Cells | EV2 Tumor Cells | Tumor Cells | TT1 Myeloid Cells | Monocytes | MM3 Stromal Cells | Cancer-associated fibroblasts (CAFs) | SC2 Lymphocytes | B cells | LB7 Lymphocytes | T cells | LT8 Endothelial Cells | Vascular Endothelial Cells | EV3

tinyi commented 7 months ago

When you say "Tumor" and "Normal", do you mean tumor samples and normal samples, rather than malignant and non-malignant cells? I am asking as I saw even lymphocytes show up in both groups.

On Fri, Nov 24, 2023 at 9:47 PM Lao-Tz @.***> wrote:

Extracting the 'minor_cluster' and 'group' columnsminor_cluster <- @.$minor_clustergroup <- @.$group

Creating a table that lists the count of 'minor_cluster' in each groupcluster_table <- table(minor_cluster, group)

Finding the 'minor_cluster' with a count of 0 in the 'Normal' and 'Tumor' groupstumor <- row.names(cluster_table)[cluster_table[, "Normal"] == 0]normal <- row.names(cluster_table)[cluster_table[, "Tumor"] == 0]

Setting the corresponding 'major_cluster' and 'minor_cluster' of these clusters as "Tumor Cells" and "Normal @.**@.$minor_cluster %in% tumor] <- "Tumor @.**@.$minor_cluster %in% normal] <- "Normal Cells"

first round> table(sce$minor_cluster,sce$group)

  Normal Tumor

EE1 983 0 EG1 1705 2248 EG2 601 1524 EG3 683 13 EV1 2413 0 EV2 0 1183 EV3 0 31 GC1 1381 2216 LB1 3689 97 LB2 0 2901 LB3 1330 0 LB4 0 1184 LB5 0 2788 LB6 243 0 LB7 0 135 LT1 4011 1501 LT2 2118 2585 LT3 0 3067 LT4 0 1230 LT5 139 603 LT6 242 0 LT7 150 0 LT8 0 118 MM1 1471 358 MM2 0 973 MM3 0 544 MN1 1050 473 MY1 558 443 NN1 1346 0 SC1 807 0 SC2 0 307 SF1 3234 0 SM1 0 1110 TT1 0 535

table(sce$major_cluster,sce$group)

                Normal Tumor

Endocrine Cells 1346 0 Endothelial Cells 2413 0 Epithelial Cells 5353 6001 Lymphocytes 11922 4786 Myeloid Cells 2521 831 Stromal Cells 4599 443 Tumor Cells 0 16106

table(sce$major_cluster,sce$minor_cluster)

                 EE1  EG1  EG2  EG3  EV1  EV2  EV3  GC1  LB1  LB2  LB3  LB4  LB5  LB6  LB7  LT1  LT2  LT3  LT4  LT5  LT6  LT7  LT8  MM1  MM2  MM3  MN1  MY1  NN1  SC1  SC2  SF1  SM1  TT1

Endocrine Cells 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1346 0 0 0 0 0 Endothelial Cells 0 0 0 0 2413 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Epithelial Cells 983 3953 2125 696 0 0 0 3597 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Lymphocytes 0 0 0 0 0 0 0 0 3786 0 1330 0 0 243 0 5512 4703 0 0 742 242 150 0 0 0 0 0 0 0 0 0 0 0 0 Myeloid Cells 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1829 0 0 1523 0 0 0 0 0 0 0 Stromal Cells 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1001 0 807 0 3234 0 0 Tumor Cells 0 0 0 0 0 1183 31 0 0 2901 0 1184 2788 0 135 0 0 3067 1230 0 0 0 118 0 973 544 0 0 0 0 307 0 1110 535

second round (Another Rscript)

Idents(sce) <- "minor_cluster"NT_keep = table(sce$minor_cluster,sce$group) %>% as.data.frame() %>% filter(Freq == 0) %>% select(Var1)sce <- subset(sce, idents = NT_keep$Var1)

My Tumor subgroup was sampled by layers, and then merged manually according to the number of cells. My subgroup annotation is based on the first 50 genes of the FindAllMarkers function in seurat package, and some of them may be able to see what cell type it is just by looking at the Top 10 or even the Top 5 genes. I am a novice in the analysis of single cell sequencing data, and I have always wondered why everyone can annotate tumor cells when it is clear that they are all expression states of tumor microenvironment cells. Thanks!

major_cluster minor_cluster1 minor_cluster2 Lymphocytes T cells LT1 Lymphocytes T cells LT2 Lymphocytes B cells LB1 Epithelial Cells Gastric Endocrine Cells EG1 Lymphocytes T cells LT3 Epithelial Cells Gastric Endocrine Cells EG2 Stromal Cells Fibroblasts SF1 Endothelial Cells Vascular Endothelial Cells EV1 Myeloid Cells Macrophages MM1 Lymphocytes B cells LB2 Myeloid Cells Neutrophils MN1 Lymphocytes B cells LB3 Epithelial Cells Gastric Chief Cells GC1 Lymphocytes B cells LB4 Lymphocytes B cells LB5 Stromal Cells Mast Cells SM1 Lymphocytes T cells LT4 Myeloid Cells Macrophages MM2 Stromal Cells Myofibroblasts MY1 Stromal Cells Cancer-associated fibroblasts (CAFs) SC1 Lymphocytes T cells LT5 Epithelial Cells Epithelial Cells EE1 Endocrine Cells Neuroendocrine Cells NN1 Lymphocytes T cells LT6 Lymphocytes B cells LB6 Lymphocytes T cells LT7 Epithelial Cells Gastric Endocrine Cells EG3 Endothelial Cells Vascular Endothelial Cells EV2 Tumor Cells Tumor Cells TT1 Myeloid Cells Monocytes MM3 Stromal Cells Cancer-associated fibroblasts (CAFs) SC2 Lymphocytes B cells LB7 Lymphocytes T cells LT8 Endothelial Cells Vascular Endothelial Cells EV3

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/BayesPrism/issues/65#issuecomment-1825700843, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHS3LDLRKLBHQZ2ZVXJ3YGCQOBAVCNFSM6AAAAAA7CQUYIOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRVG4YDAOBUGM . You are receiving this because you commented.Message ID: @.***>

Lao-Tz commented 7 months ago

My "Tumor" and "Normal" here are the markers of "Cancer" and "Adjacent tissues" in the original data. I don't have enough experience to distinguish malignant and non-malignant cells, or I don't know how everyone does it, because I am the only one in our laboratory who is groping for single cell sequencing analysis.

Through pie chart, I observed the distribution of subgroups after dimensionality reduction of LIGER package clustering and FindClusters function, and tried to choose the parameters with the greatest difference between cancer and adjacent cancer, which resulted in lymphocytes and others appearing in "Tumor" and "Normal".

Therefore, in the case that the state subgroup only distributed in "Tumour" and "Normal" accounts for almost half, I consider extracting the state subgroup only distributed in "Tumour" and "Normal" and merging it into the type subgroup, and I don't set the key to run BayesPrism. I think it is still convincing.

Subsequently, I intend to use the type subgroup screened from here for Monocle and iTalk analysis, run WGCNA on the results of state and CIBERSORT, select the results with better results, intersect the above processes to find the key prognostic genes and build a gene model, which completes my exploration of single cell data at this stage.

Can you give me some advice for a beginner? Thank you for your reply!

Lao-Tz commented 7 months ago

I found my problem. There are so many zero values because my merge function doesn't match. It's over. I have to do it again.

Lao-Tz commented 7 months ago

I used copyKAT to find that the effect was not very good, so I used endothelial cells as annotations_file to run inferCNV and found that half of the epithelial cell subsets were obviously malignant, but this was far from the number of malignant cells in BayesPrism's paper. I found that my scRNA data has a lot of lymphocytes after dimensionality reduction clustering, and the lymphocytes have TCR or BCR copy number variation, and the lymphocytes in the cancer I studied do not seem to be malignant. So I still have doubts about how this type data should be constructed.