Closed pakiessling closed 6 months ago
Hey, thanks for reaching out.
The weird thing is that one cell type is consistently removed from fit$Beta
I assume you mean that when you look at the columns names of Beta, you get something like "Intercept", "patient2", "patient3", "celltype2", "celltype3", "celltype4", "celltype5", ..., correct?
This behavior is common for any package that works with a design matrix. We can quickly check this behavior:
df <- data.frame(patient = sample(paste0("pat_", 1:3), size = 30, replace = TRUE),
celltype = sample(paste0("ct", 1:5), size = 30, replace = TRUE))
design_matrix <- model.matrix(~ patient + celltype, data = df)
colnames(design_matrix)
#> [1] "(Intercept)" "patientpat_2" "patientpat_3" "celltypect2" "celltypect3"
#> [6] "celltypect4" "celltypect5"
Created on 2024-04-15 with reprex v2.1.0
There is a nice in-depth article on design matrices for gene expression studies on F1000.
The trick is that your "celltype1" becomes the default level and all other coefficients describe how much the fit deviates from the condition where patient = "pat_1" and celltype = "ct_1". As this can take some time to get used to, I build a small package a few years ago to illustrate this: https://github.com/const-ae/CoefExplainer.
I hope this helps.
Best, Constantin
Ah that makes sense. Thanks for the in-depth explanation. I was trying to do something like this:
for (subtype in rowData(sce)$cell_subtype) {
print(paste("Processing cell subtype:", subtype))
de_res <- test_de(fit, contrast = subtype)
file_name <- paste0("de_bulk_", subtype, ".csv")
write.table(de_res, file_name, sep = ",")
}
to do a one vs rest comparison for every cell type and it always crashed when looping over a certain cell type. Should probably be something like:
contrast <- makeContrasts(cell_subtypeA - (cell_subtypeB + cell_subtypeC + cell_subtypeD))
de_res <- test_de(fit, contrast = contrast )
instead, if I understand your link right . And do that for every comparison?
But I still don't get how to do that for cell_subtypeE
that has been chosen as default 😅
Ty!
The second approach is already very close. Take a look at the cond
function, it automatically creates the contrast vector for each condition. So you would need to write, something like
de_res <- test_de(fit, contrast = cond(cell_type = "A") - (cond(cell_type = "B") + cond(cell_type = "C) + cond(cell_type = "D") + cond(cell_type = "E")) / 5)
de_res <- test_de(fit, contrast = cond(cell_type = "B") - (cond(cell_type = "A") + cond(cell_type = "C) + cond(cell_type = "D") + cond(cell_type = "E")) / 5)
...
de_res <- test_de(fit, contrast = cond(cell_type = "E") - (cond(cell_type = "A") + cond(cell_type = "B") + cond(cell_type = "C) + cond(cell_type = "D")) / 5)
Thank you so much for educating me! Highly appreciate it!
Hi,
thanks for the tool, it looks super powerful!
I am trying it out on my scRNA dataset, with the goal of finding marker genes for cell types.
The weird thing is that one cell type is consistently removed from fit$Beta, even though the pseudobulks of this cell type have a good amount of cells and replicates.
Any idea what could cause this?
Thanks a lot!