Too many DEGs? - Githubissues

Hi,

I am a fresh user of glmGamPoi and thanks for providing this awesome tool! I tried to identify DEGs in scRNA and successfully run the following code:

load('/Volumes/kfsd/GSE200903_new_combined_cluster_FibroSamples_2022-01-14.RData')
combine_cluster <- as.SingleCellExperiment(combined_cluster)

set.seed(1)
colData(combine_cluster)
# Take highly expressed genes and proper cells:
combine_cluster_subset <- combine_cluster[rowSums(counts(combine_cluster)) > 100, 
                                          which( ! is.na(combine_cluster$CellType) & combine_cluster$CellType 
                                                 %in% c("fibroblasts"))]

# add a stim2 based on stim
colData(combine_cluster_subset)$stim2 <- colsplit(colData(combine_cluster_subset)$stim, 
                                                  split = "_", names = c('cell', 'trt','time'))$trt
# Convert counts to dense matrix
counts(combine_cluster_subset) <- as.matrix(counts(combine_cluster_subset))
# Remove empty levels because glm_gp() will complain otherwise
colData(combine_cluster_subset)$stim2 <- as.factor(colData(combine_cluster_subset)$stim2)
colData(combine_cluster_subset)$stim2 <- relevel(colData(combine_cluster_subset)$stim2, ref = "normal")

fit <- glm_gp(combine_cluster_subset, design = ~ stim2 ,
              reference_level = "normal")
summary(fit)
colnames(fit$Beta)

# The contrast argument specifies what we want to compare
# We test the expression difference of stimulated and control T-cells
#
# There is no sample label in the colData, so we create it on the fly
# from `stim` and `ind` columns in colData(fit$data).
de_res <- test_de(fit, contrast = `stim2PDAC`, 
                  pseudobulk_by = paste0(stim2, "-", rownames(colData(combine_cluster_subset)))) 

# The large `lfc` values come from groups were nearly all counts are 0
# Setting them to Inf makes the plots look nicer
de_res$lfc <- ifelse(abs(de_res$lfc) > 20, sign(de_res$lfc) * Inf, de_res$lfc)

# Most different genes
de_res<-de_res[order(de_res$pval), ]

However, I identified 11752 DEGs out of 14537 genes and I found many genes has df1=1.

head(de_res)
             name pval adj_pval f_statistic df1      df2        lfc
35           Rpl7    0        0    3244.666   1 20005.31 -0.5195890
44           Pi15    0        0    2180.603   1 20005.31 -1.6835958
124         Il1r1    0        0    3883.918   1 20005.31  2.2249821
132          Fhl2    0        0    5068.944   1 20005.31  4.8176423
158           Gls    0        0    1635.089   1 20005.31  0.9511179
164 1700019D03Rik    0        0    3380.809   1 20005.31 -2.7193859

I wondered how could I address this problem? Thanks in advance!

Best, Kun

const-ae / glmGamPoi

Too many DEGs? #35