Closed f6v closed 1 year ago
Hi,
thanks for giving glmGamPoi
a try.
Does the variable order inside vars make any difference? Do I specify the group_by param correctly?
No, it doesn't matter. It only affects the order of the columns in the resulting colData(aggr_sce)
.
Is pseudobulk_sce supposed to take a long time? It takes > 2 hrs to create the pseudobulk profiles.
That seems surprisingly long. I wonder if you are somehow creating many more pseudobulk samples than intended. You can check how many pseudobulk samples are created by running:
colData(sce) %>%
as_tibble()
group_by(sample_name, dataset, celltype, age_group) %>%
summarize(n_cells = n())
Each row of the result tibble is a unique combination of the four covariates and n_cells
tells you how many cells are combined for that specifc pseudobulk sample.
Thanks @const-ae!
I looked deeper into it, and it turned out the counts
were stored as dgTMatrix
rather than dgCMatrix
which made everything too slow.
Ah, great that you found a way to solve the problem :)
Thanks for developing the package!
I've got couple questions regarding
pseudobulk_sce
function. My dataset has 14904 genes and 453154 cells. I want to fit a model with the following design:So, I create pseudobulk profiles like so:
To clarify, counts for each
sample_name
andcell_type_L1
combination should be aggregated, but I also want to keepage_group
anddataset
as a covariates.My questions are:
vars
make any difference? Do I specify thegroup_by
param correctly?pseudobulk_sce
supposed to take a long time? It takes > 2 hrs to create the pseudobulk profiles.Thanks!