Closed Feng-Zhang closed 1 year ago
C-SIDE has a feature to run DE across multiple samples, the function CSIDE.population.inference
. It will estimate a population mean and standard error.
We don't have a particular implementation for the case vs. control setting, but I recommend running on cases and controls separately, and then doing a Z-test on the means afterwards (which requires the means and standard errors).
@dmcable Thank you for your reply. How can I run CSIDE for each slices without explanatory.variable? Can we define the explanatory.variable as 0 for all spots in control slices and 1 for all spots in case slices?
To run C-SIDE with just an intercept term, see https://github.com/dmcable/spacexr/issues/87#issuecomment-1192718351
Dylan
@dmcable Hi Dylan, There are two questions still confusing me :
barcodes <- intersect(names(myRCTD@spatialRNA@nUMI), colnames(myRCTD@spatialRNA@counts))
X <- as.matrix(rep(1, length(barcodes)))
rownames(X) <- barcodes
myRCTD <- run.CSIDE(myRCTD, X, barcodes)
The above example you give used X as a design matrix, however, the elements in exvar_list are integer
objects as running run.CSIDE.replicates function blow.
myRCTD.reps <- run.CSIDE.replicates(myRCTD.reps, exvar_list, cell_types, gene_threshold = .01,
cell_type_threshold = 5, fdr = 0.25)
run.CSIDE(myRCTD, X, barcodes)
for each sample one by one, and then using CSIDE.population.inference to obtain cell type specific DEG?This is a good point. I have updated the run.CSIDE.replicates
function to handle this case. You should set de_mode = 'general'
, explanatory.variable.replicates = NULL
, and X.replicates = X.replicates
. X.replicates
should be a list of the X matrices. Also, a warning that your parameters are for testing and the recommended default parameters for proper running.
@dmcable Hi Dylan, I get an error:
Error in run.CSIDE.replicates(myRCTD_reps, de_mode = "general", explanatory.variable.replicates = NULL, :
run.CSIDE.replicates: explanatory.variable.replicates must be a list of explanatory variable vectors for each replicate.
I guess it is due to this common in run.CSIDE.replicates function:
if (class(explanatory.variable.replicates) != "list")
stop("run.CSIDE.replicates: explanatory.variable.replicates must be a list of explanatory variable vectors for each replicate.")
Could you please fix this small bug when you are free?
Thanks for pointing this out. The error checking has been fixed now.
@dmcable Hi Dylan,
I modified a little bit of your code in the vignette:
library(spacexr)
library(ggplot2)
library(Matrix)
library(doParallel)
prefix <- "spacexr-"
output_prefix <- file.path(work_dir,prefix)
savedir <- 'RCTD_batch_results'
if(!dir.exists(savedir))
dir.create(savedir)
# load Reference object --------------------
refdir <- system.file("extdata",'Reference/Vignette',package = 'spacexr') # directory for the reference
counts <- read.csv(file.path(refdir,"dge.csv")) # load in counts matrix
rownames(counts) <- counts[,1]; counts[,1] <- NULL # Move first column to rownames
meta_data <- read.csv(file.path(refdir,"meta_data.csv")) # load in meta_data (barcodes, clusters, and nUMI)
cell_types <- meta_data$cluster; names(cell_types) <- meta_data$barcode # create cell_types named list
cell_types <- as.factor(cell_types) # convert to factor data type
nUMI <- meta_data$nUMI; names(nUMI) <- meta_data$barcode # create nUMI named list
reference <- Reference(counts, cell_types, nUMI)
# load SpatialRNA object and desing matrix --------------
spatial_replicates <- NULL
X_replicates <- NULL
samples <- c("Vignette","Vignette_rep2","Vignette_rep3","Vignette_rep4")
for(i in samples){
# i <- 1
datadir <- system.file("extdata",'SpatialRNA',i,package = 'spacexr')
puck <- read.SpatialRNA(datadir, count_file = "MappedDGEForR.csv", coords_file = 'BeadLocationsForR.csv')
spatial_replicates[[i]] <- puck
barcodes <- intersect(names(puck@nUMI), colnames(puck@counts))
X_design <- as.matrix(rep(1, length(barcodes)))
rownames(X_design) <- barcodes
X_replicates[[i]] <- X_design
}
group_ids <- c(1,1,1,1)
# run RCTD and CSIDE -----------------------------
myRCTD_reps <- create.RCTD.replicates(spatial_replicates, reference, samples, group_ids = group_ids, max_cores = 6)
myRCTD_reps <- run.RCTD.replicates(myRCTD_reps)
cell_types <- c('10','18')
myRCTD_reps <- run.CSIDE.replicates(myRCTD_reps, de_mode = 'general', explanatory.variable.replicates = NULL, X.replicates = X_replicates,cell_types=cell_types,
gene_threshold = .01, cell_type_threshold = 0, fdr = 0.25)
After running run.CSIDE.replicates, I still have two problems confusing me.
When cell_type_threshold=1
, which is larger than 0, it would throw an error below:
run.CSIDE.replicates: testing CSIDE for errors for replicate 1
Error in choose_cell_types(myRCTD, barcodes, doublet_mode, cell_type_threshold, :
choose_cell_types: cell types: 10, 18 detected using aggregate_cell_types to have less than the minimum cell_type_threshold of 1. To fix this issue, please remove these cell types or reduce the cell_type_threshold
Warning message:
In run.CSIDE.general(myRCTD, X1, X2, barcodes, cell_types, cell_type_threshold = cell_type_threshold, :
Error in choose_cell_types(myRCTD, barcodes, doublet_mode, cell_type_threshold, :
choose_cell_types: cell types: 10, 18 detected using aggregate_cell_types to have less than the minimum cell_type_threshold of 1. To fix this issue, please remove these cell types or reduce the cell_type_threshold
If I set cell_type_threshold = 0
and keep the cell , myRCTD_reps@population_de_results is list()
although no error was throwed.
In this way, how do we design the group to define slice is case or control?
Your code is correct. There was a bug in run.CSIDE.replciates
which is fixed now.
cell_type_threshold
is the minimum that each cell type must appear from each cell type. Outside of debugging / testing, I recommend setting it to the default parameter. For the vignette data, it is set to 5. It is not recommended to set it to 1 or 0.
You should run it separately for cases vs controls. C-SIDE will not be aware of whether you are running it for cases or controls.
@dmcable Thank you for your reminder. I would use the default parameter on my real data. I have got the CSID results without any error, and the structure of myRCTD_reps is shown as :
> str(myRCTD_reps,max.level = 2)
Formal class 'RCTD.replicates' [package "spacexr"] with 5 slots
..@ RCTD.reps :List of 4
..@ population_de_results : list()
..@ population_sig_gene_list: list()
..@ population_sig_gene_df : list()
..@ group_ids : Named num [1:4] 1 1 1 1
.. ..- attr(*, "names")= chr [1:4] "Vignette" "Vignette_rep2" "Vignette_rep3" "Vignette_rep4"
However, how can I run differential analysis grouped by slices in detail? What do you think I can get started with? As you mentioned above, doing a Z-test on the means and standard errors?
You can obtain means and standard errors for each gene for each group, as documented within the package.
Then, you can do a Z-test for testing for gene expression differences across groups. The Z-test is directly computable from the C-SIDE output. The Z-statistic is computed is Z = (m1 - m2)/sqrt(s1^2+s2^2). Where m1,m2 are the estimated expression of each cell type, whereas s1,s2 are the standard errors. This can then be converted to a p-value using the normal distribution CDF.
@dmcable Sorry for asking the questions below after you closed this issue. I still can't obtain the estimated expression of each type.
myRCTD_reps@RCTD.reps[[4]]@de_results
is list(). You need to get the population-level results. This is documented in the spatial transcriptomics vignette within "CSIDE Results": https://raw.githack.com/dmcable/spacexr/master/vignettes/replicates.html
In this case, you want to use all genes including nonsignificant, found here: myRCTD.reps@population_de_results
.
Next time, please fully read the documentation before asking questions that are covered within the documentation.
Sorry for misleading you.
I have read and run this document before. I mean the run.CSIDE.replciates
seems not to return these results, and myRCTD.reps@population_de_results
and myRCTD_reps@RCTD.reps[[1]]@de_results
are empty.
Ok, thanks for pointing out this issue. There was a couple bugs in the run.CSIDE.replicates
, which have now fixed this issue.
Thank you for your prompt reply.
Now I can obtain the population inference, but myRCTD_reps@population_de_results[["10"]]
doesn't return all gene inference. It only has significant gene information as same as elements in myRCTD_reps@population_sig_gene_df
. Should it return all gene inferences?
> dim(myRCTD_reps@population_de_results[["10"]])
[1] 10 9
> dim(myRCTD_reps@population_sig_gene_df[["10"]])
[1] 10 17
In addition, I am not familiar with biostatistics, and still don't understand how to draw the z statistic value grouped by slices, even though we can get all mean and standard variance. I also read the one_ct_genes
and get_de_pop
functions. If I want to define the first two samples as a control group and other samples as a case group and estimate Z-statistic value, using the group_ids <- c(1,1,2,2)
and use.groups =T
can make it, is it? If yes, how can I pass the use.groups =T
to run.CSIDE.replicates
function?
head(myRCTD_reps@population_sig_gene_df[["10"]])
tau log_fc_est sd_est Z_est p_cross ct_prop expr q_val p
Plp1 0 -8.043942 0.32468262 -24.77479 1 1.0000000 2.029427e-05 1.676547e-135 1.676547e-135
Itm2b 0 -5.608008 0.12565190 -44.63130 1 1.0000000 4.975799e-04 0.000000e+00 0.000000e+00
Rora 0 -5.557170 0.14095045 -39.42640 1 1.0000000 2.325036e-03 0.000000e+00 0.000000e+00
Calm2 0 -4.994137 0.08651132 -57.72814 1 0.8599780 5.844512e-04 0.000000e+00 0.000000e+00
Lsamp 0 -4.743579 0.12002038 -39.52312 1 1.0000000 6.584614e-03 0.000000e+00 0.000000e+00
Calm1 0 -4.636973 0.08176442 -56.71138 1 0.8396981 1.895912e-04 0.000000e+00 0.000000e+00
mean 1 mean 2 mean 3 mean 4 sd 1 sd 2 sd 3 sd 4
Plp1 -8.043942 -8.043942 -8.043942 -8.043942 0.6493652 0.6493652 0.6493652 0.6493652
Itm2b -5.608008 -5.608008 -5.608008 -5.608008 0.2513038 0.2513038 0.2513038 0.2513038
Rora -5.557170 -5.557170 -5.557170 -5.557170 0.2819009 0.2819009 0.2819009 0.2819009
Calm2 -4.994137 -4.994137 -4.994137 -4.994137 0.1730226 0.1730226 0.1730226 0.1730226
Lsamp -4.743579 -4.743579 -4.743579 -4.743579 0.2400408 0.2400408 0.2400408 0.2400408
Calm1 -4.636973 -4.636973 -4.636973 -4.636973 0.1635288 0.1635288 0.1635288 0.1635288
There is no concern; it is possible that all genes were considered significant in this case, but that should be ignored. These are all the genes.
No, the use of group_ids is not applicable in your case. You should create two separate RCTD.replicates objects for each group of samples and then compare the mean estimates (log_fc variable in the table) using the Z test formula I provided.
On Thu, Sep 22, 2022 at 10:00 PM fengzhang @.***> wrote:
Thank you for your prompt reply. Now I can obtain the population inference, but @._de_results[["10"]] doesn't return all gene inference. It only has significant gene information as same as elements in @._sig_gene_df. Should it return all gene inferences?
@._de_results[["10"]]) [1] 10 9 @._sig_gene_df[["10"]]) [1] 10 17
In addition, I am not familiar with biostatistics, and still don't understand how to draw the z statistic value grouped by slices, even though we can get all mean and standard variance. I also read the one_ct_genes and get_de_pop functions. If I want to define the first two samples as a control group and other samples as a case group, using the group_ids <- c(1,1,2,2) and use.groups =T can make it, is it? If yes, how can I pass the use.groups =T to run.CSIDE.replicates function?
@.***_sig_gene_df[["10"]]) tau log_fc_est sd_est Z_est p_cross ct_prop expr q_val p Plp1 0 -8.043942 0.32468262 -24.77479 1 1.0000000 2.029427e-05 1.676547e-135 1.676547e-135 Itm2b 0 -5.608008 0.12565190 -44.63130 1 1.0000000 4.975799e-04 0.000000e+00 0.000000e+00 Rora 0 -5.557170 0.14095045 -39.42640 1 1.0000000 2.325036e-03 0.000000e+00 0.000000e+00 Calm2 0 -4.994137 0.08651132 -57.72814 1 0.8599780 5.844512e-04 0.000000e+00 0.000000e+00 Lsamp 0 -4.743579 0.12002038 -39.52312 1 1.0000000 6.584614e-03 0.000000e+00 0.000000e+00 Calm1 0 -4.636973 0.08176442 -56.71138 1 0.8396981 1.895912e-04 0.000000e+00 0.000000e+00 mean 1 mean 2 mean 3 mean 4 sd 1 sd 2 sd 3 sd 4 Plp1 -8.043942 -8.043942 -8.043942 -8.043942 0.6493652 0.6493652 0.6493652 0.6493652 Itm2b -5.608008 -5.608008 -5.608008 -5.608008 0.2513038 0.2513038 0.2513038 0.2513038 Rora -5.557170 -5.557170 -5.557170 -5.557170 0.2819009 0.2819009 0.2819009 0.2819009 Calm2 -4.994137 -4.994137 -4.994137 -4.994137 0.1730226 0.1730226 0.1730226 0.1730226 Lsamp -4.743579 -4.743579 -4.743579 -4.743579 0.2400408 0.2400408 0.2400408 0.2400408 Calm1 -4.636973 -4.636973 -4.636973 -4.636973 0.1635288 0.1635288 0.1635288 0.1635288
— Reply to this email directly, view it on GitHub https://github.com/dmcable/spacexr/issues/100#issuecomment-1255725110, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJUR66LMRVTW3ZUW473HLJLV7UFKRANCNFSM6AAAAAAQKGMCNY . You are receiving this because you were mentioned.Message ID: @.***>
Thank you. As your suggestion, I run the code on my visium data:
RCTD_controls <- create.RCTD.replicates(spatial_controls, reference, samples_controls, max_cores = 20)
RCTD_controls <- run.RCTD.replicates(RCTD_controls, doublet_mode = 'full')
cell_types <- as.character(unique(reference@cell_types))
CSIDE_controls <- run.CSIDE.replicates(RCTD_controls, de_mode = 'general', explanatory.variable.replicates = NULL, X.replicates = X_controls, cell_types=cell_types,
gene_threshold = 5e-05, cell_type_threshold = 5, fdr = 0.01)
When doublet_mode = "full"
to run RCTD analysis, the run.CSIDE.replicates
will throw error:
run.CSIDE.replicates: running CSIDE for replicate 1
Error in if (gene_threshold == 0.01 || fdr == 0.25 || cell_type_threshold == :
missing value where TRUE/FALSE needed
However, it is no problem using the default value of doublet_mode on example data from this .
Ok, I am happy to look into debugging the error. Can you please send me the script and input used to reproduce the error? Thanks.
Due to the large input file, I have no idea how to post, and thus change to use the example data from your vignette. But the error is different from last time:
run.CSIDE.replicates: running CSIDE for replicate 1
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'colSums': invalid character indexing
In addition: Warning message:
In run.CSIDE.general(myRCTD, X1, X2, barcodes, cell_types, cell_type_threshold = cell_type_threshold, :
run.CSIDE.general: some parameters are set to the CSIDE vignette values, which are intended for testing but not proper execution. For more accurate results, consider using the default parameters to this function.
The code used is here:
library(tidyverse)
library(spacexr)
library(ggplot2)
library(Matrix)
library(doParallel)
prefix <- "spacexr-"
output_prefix <- file.path(work_dir,prefix)
savedir <- 'RCTD_batch_results'
if(!dir.exists(savedir))
dir.create(savedir)
# load Reference object --------------------
refdir <- system.file("extdata",'Reference/Vignette',package = 'spacexr') # directory for the reference
counts <- read.csv(file.path(refdir,"dge.csv")) # load in counts matrix
rownames(counts) <- counts[,1]; counts[,1] <- NULL # Move first column to rownames
meta_data <- read.csv(file.path(refdir,"meta_data.csv")) # load in meta_data (barcodes, clusters, and nUMI)
cell_types <- meta_data$cluster; names(cell_types) <- meta_data$barcode # create cell_types named list
cell_types <- as.factor(cell_types) # convert to factor data type
nUMI <- meta_data$nUMI; names(nUMI) <- meta_data$barcode # create nUMI named list
reference <- Reference(counts, cell_types, nUMI)
# load SpatialRNA object and desing matrix --------------
# control samples
spatial_controls <- NULL
X_controls <- NULL
samples_controls <- c("Vignette","Vignette_rep2")
for(i in samples_controls){
# i <- samples_controls[2]
datadir <- system.file("extdata",'SpatialRNA',i,package = 'spacexr')
puck <- read.SpatialRNA(datadir, count_file = "MappedDGEForR.csv", coords_file = 'BeadLocationsForR.csv')
spatial_controls[[i]] <- puck
barcodes <- intersect(names(puck@nUMI), colnames(puck@counts))
X_design <- as.matrix(rep(1, length(barcodes)))
rownames(X_design) <- barcodes
X_controls[[i]] <- X_design
}
# treatment samples
spatial_treats <- NULL
X_treats <- NULL
samples_treats <- c("Vignette_rep3","Vignette_rep4")
for(i in samples_treats){
# i <- 1
datadir <- system.file("extdata",'SpatialRNA',i,package = 'spacexr')
puck <- read.SpatialRNA(datadir, count_file = "MappedDGEForR.csv", coords_file = 'BeadLocationsForR.csv')
spatial_treats[[i]] <- puck
barcodes <- intersect(names(puck@nUMI), colnames(puck@counts))
X_design <- as.matrix(rep(1, length(barcodes)))
rownames(X_design) <- barcodes
X_treats[[i]] <- X_design
}
# run RCTD and CSIDE for control and treatment group -----------------------------
RCTD_controls <- create.RCTD.replicates(spatial_controls, reference, samples_controls, max_cores = 6)
RCTD_controls <- run.RCTD.replicates(RCTD_controls,doublet_mode = "full")
cell_types <- c('10','18')
CSIDE_controls <- run.CSIDE.replicates(RCTD_controls, de_mode = 'general', explanatory.variable.replicates = NULL, X.replicates = X_controls, cell_types=cell_types,
gene_threshold = .01, cell_type_threshold = 5, fdr = 0.25,doublet_mode = F)
Hi @dmcable , I have pushed a test data and script on here, which can reproduce the error:
run.CSIDE.replicates: running CSIDE for replicate 1
Error in if (gene_threshold == 0.01 || fdr == 0.25 || cell_type_threshold == :
missing value where TRUE/FALSE needed
Thanks.
Thank you for discovering these errors. There were two separate bugs that are now fixed.
It still throws an error using the data on google drive:
CSIDE.population.inference: running population DE inference with use.groups=FALSE
Error in CSIDE.population.inference(RCTD.replicates, log_fc_thresh = log_fc_thresh) :
CSIDE.population.inference: minimum of three replicates required for population mode.
I remove a cell type to avoid the error caused by the minimal number of cells so that the code was modified a bit. The code on google drive is updtaed.
cell_types <- as.character(unique(reference@cell_types))[1:2]
CSIDE_controls <- run.CSIDE.replicates(RCTD_controls, de_mode = 'general', explanatory.variable.replicates = NULL, X.replicates = X_controls,
cell_types=cell_types,doublet_mode =F,
cell_type_threshold=5)
This is intended behavior of C-SIDE. Please refer to the error message above. The minimum is 3 replicates.
It is weird because we have 4 replicates for the control group.
length(RCTD_controls@RCTD.reps)
[1] 4
Interesting, I see. There was a bug in the code, but it is now fixed, thank you! I was able to run your code to completion in the population-DE mode.
@dmcable really appreciated. There aren't any errors now. As you know, I am newer in biostatistics. I wonder if could you confirm the process of calculating the p-value?
cell_type <- 'Neurons'
z <- (CSIDE_controls@population_de_results[[cell_type]]["log_fc_est"]-CSIDE_treats@population_de_results[[cell_type]]["log_fc_est"])/sqrt(CSIDE_controls@population_de_results[[cell_type]]["sd_est"]^2+CSIDE_treats@population_de_results[[cell_type]]["sd_est"]^2)
2*pnorm(-abs(z$log_fc_est))
Is the code get the right p-value?
Yes, that looks correct.
Dylan
On Sun, Sep 25, 2022 at 5:44 AM fengzhang @.***> wrote:
@dmcable https://github.com/dmcable really appreciated. There aren't any errors now. As you know, I am newer in biostatistics. I wonder if could you confirm the process of the p-value?
cell_type <- 'Neurons'
z <- @.**@*.**@*.**@._de_results[[cell_type]]["sd_est"]^2)
2*pnorm(-abs(z$log_fc_est))
Is the code get the right p-value?
— Reply to this email directly, view it on GitHub https://github.com/dmcable/spacexr/issues/100#issuecomment-1257157437, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJUR66IN63WHNSMJOEHTSETWAANI3ANCNFSM6AAAAAAQKGMCNY . You are receiving this because you were mentioned.Message ID: @.***>
Thank you a lot.
@dmcable really appreciated. There aren't any errors now. As you know, I am newer in biostatistics. I wonder if could you confirm the process of calculating the p-value?
cell_type <- 'Neurons' z <- (CSIDE_controls@population_de_results[[cell_type]]["log_fc_est"]-CSIDE_treats@population_de_results[[cell_type]]["log_fc_est"])/sqrt(CSIDE_controls@population_de_results[[cell_type]]["sd_est"]^2+CSIDE_treats@population_de_results[[cell_type]]["sd_est"]^2) 2*pnorm(-abs(z$log_fc_est))
Is the code get the right p-value?
Hi @dmcable @Feng-Zhang , Here, since the explanatory.variable.replicates = NULL in run.CSIDE.replicates,
Thanks in advance! Guanglin
log_fc_est
represents the parameter of interest that is estimated by the model. In this case, it would be the intercept term, representing average expression of the cell type. CT.PROP
parameter to CSIDE.population.inference
with default value 0.5. This parameter is the minimum ratio of gene expression within cell type compared to other cell types. To get more genes per cell type, it could be lowered, along with gene_threshold
. Hi Dylan and Feng @dmcable @Feng-Zhang ,
Following up on the last question, as you explained, "log_fc_est" is the intercept term, representing the average expression of the cell type. However, in my below dataset, they are all negative. Did you see this in your datasets? So how can I interpret the meaning?
Really appreciate your time and help! Guanglin
head(test@population_de_results$neuron)
tau log_fc_est sd_est Z_est p_cross ct_prop expr q_val p
Mrpl15 0 -10.050433 0.33313872 -30.16891 0.9073508 0.5178239 5.128992e-05 6.751968e-200 6.059863e-200
Tcea1 0 -9.413501 0.24295646 -38.74563 0.6440019 0.6286491 5.394797e-05 0.000000e+00 0.000000e+00
Atp6v1h 0 -8.202751 0.07948133 -103.20348 0.9702251 1.0000000 1.328114e-04 0.000000e+00 0.000000e+00
Rb1cc1 0 -8.802166 0.11677720 -75.37573 0.8318435 1.0000000 4.525399e-04 0.000000e+00 0.000000e+00
Pcmtd1 0 -9.206746 0.16736923 -55.00859 0.6969403 0.4847225 2.016156e-04 0.000000e+00 0.000000e+00
Vcpip1 0 -9.399355 0.18967900 -49.55401 0.3559265 1.0000000 1.153743e-04 0.000000e+00 0.000000e+00
Hey @glzhang90095 it is the average log expression, so it would be expected to be negative. If you take exp(log_fc_est) you will have units of 'counts per 1', multiply by 1,000,000 if you want counts per million etc.
Hey @glzhang90095 it is the average log expression, so it would be expected to be negative. If you take exp(log_fc_est) you will have units of 'counts per 1', multiply by 1,000,000 if you want counts per million etc.
How would you go about to get the counts for a library size (from nUMI) to compare the predicted spatial gene expression, decomposed over cell-types, to the observed gene expression in your @spatialRNA
object? I am curious to visualize how the count distributions looks like, using the cell type-specific expression estimates, but I can't seem to figure it out.
Hey @glzhang90095 it is the average log expression, so it would be expected to be negative. If you take exp(log_fc_est) you will have units of 'counts per 1', multiply by 1,000,000 if you want counts per million etc.
How would you go about to get the counts for a library size (from nUMI) to compare the predicted spatial gene expression, decomposed over cell-types, to the observed gene expression in your @spatialRNA
object? I am curious to visualize how the count distributions looks like, using the cell type-specific expression estimates, but I can't seem to figure it out.
That's a great question! What you are looking for are the functions predict_CSIDE_all
(summed across all cell types) and predict_CSIDE
(a single cell type) in CSIDE_plots.R to obtain the predicted counts for each pixel in the CSIDE model. Note that these functions aren't documented/exported, so you should load them into memory. Hope this helps. We used these functions in our paper to make plots comparing CSIDE predictions to observed counts. I think it's a useful analysis.
Within the make_all_de_plots
function that makes plots for CSIDE, the function make_de_plots_quant
should generate plots of this nature and place them in the 'de_plots_quant' folder.
Best, Dylan
Hi Dylan,
Thank you for the explanation, this was exactly what I was looking for and will be trying out!
Best, Koen
I am wondering whether we can run cell-type specific differential expression analysis among samples, rather than in one slice? Let's say we have 8 slices, of which 4 for control, and 4 for case. How to run differential expression analysis for same cell types between 4 control and 4 case? We finally can obtain differential expression genes in same cell types, not the differential expression genes in different cell types?