const-ae / glmGamPoi

Fit Gamma-Poisson Generalized Linear Models Reliably
103 stars 14 forks source link

proper input for DE #10

Closed pchiang5 closed 3 years ago

pchiang5 commented 3 years ago

Hello,

I have a single-cell dataset with the structure below. If I would like to see the marker genes of the cluster2, how shall I get reasonable inputs into the following steps?

treatment | cell_type

sample 1 | cluster 2 sample 1 | cluster 1 sample 1 | cluster 1 sample 1 | cluster 3 sample 1 | cluster 2 sample 1 | cluster 3 sample 2 | cluster 3 sample 2 | cluster 3 sample 2 | cluster 2 sample 2 | cluster 1 sample 2 | cluster 2 sample 2 | cluster 1

  1. fit <- glm_gp(pbmcs, design = XXX) #shall I include the treatment, cell_type, or treatment+cell_type into the 'design' paramenter?

  2. res <- test_de(fit, contrast = cluster2- !cluster2, pseudobulk_by = treatment ) # If I have incorporated the 'treatment' in the glm_gp() above, do I still have to add the treatment into the 'pseudobulk_by' parameter?

Thank you.

const-ae commented 3 years ago

Hi pchiang5,

sorry for my delayed response.

If I would like to see the marker genes of the cluster2, how shall I get reasonable inputs into the following steps?

It depends what kind of markers we are taking about here.

  1. Do you want to find genes that are different between sample1 and sample2 for cluster 2?
  2. Or do you want to genes that are high in cluster 2 but not in any of the clusters?

In case 1), I would usually recommend to do pseudobulk, but you don't have any replicates for the treatment and control condition, so it won't work. You could run:

fit <- glm_gp(pbmcs[, pbmcs$cell_type == "cluster 2"], design = ~ treatment)
test_de(fit, contrast = `treatmentsample 1`)

However, the p-values will be unreliable because the cells are not independent replicates and the analysis is confounded by the sample effect.

In case 2), I would include the treatment information in the initial fit, to correct the batch effect due to the sample:

pbmcs$is_cluster_2 <- pbmcs$cell_type == "cluster 2"
fit <- glm_gp(pbmcs, design = ~ treatment + is_cluster_2)
test_de(fit, contrast = is_cluster_2TRUE)

Your idea to set contrasts = cluster2- !cluster2 is nice, but unfortunately does something different! To be honest, I am surprised that the syntax is valid at all. Your contrast is equivalent to contrasts = cluster2- (1 - cluster2), which would be testing if cluster2 is signficantly different from the sum of all other coefficient, which is not what you mean.

Best, Constantin

pchiang5 commented 3 years ago

Hi Constantin,

Thank you for the detailed explanation. Hope you do not mind with a more complex problem.

treatment | cell_type

A+B | cluster 1 A | cluster 1 B | cluster 1 0(None) | cluster 1 A+B | cluster 2 A | cluster 2 B | cluster 2 0 | cluster 2

In the scenario above, am I right to use the following way to identify the effect of treatment A on cluster 2?

fit <- glm_gp(pbmcs[, pbmcs$cell_type == "cluster 2"], design = ~ treatment) #Do I have to include the A+B and B groups here (probably this will reveal the A effect in the background of B)? 
test_de(fit, contrast = "treatmentA")

Further, if I would like to see the synergistic effects of A+B on cluster 2, instead of A or B alone, the following input seems to regard A+B as a single unique group, instead of the combination of A and B. Does the tool inherently compare A+B with A alone and B alone to reveal the interaction? Or shall I split the treatment as treatment1 (A and 0) and treatment2 (B and 0)?

fit <- glm_gp(pbmcs[, pbmcs$cell_type == "cluster 2"], design = ~ treatment) 
test_de(fit, contrast = "treatmentA+B")

Thanks again!

Po

const-ae commented 3 years ago

shall I split the treatment as treatment1 (A and 0) and treatment2 (B and 0)?

Yes, I think you should code the treatment as two separate columns:

treatmentA treatmentB cell_type
1 1 cluster_1
1 0 cluster_1
0 1 cluster_1
0 0 cluster_1
1 1 cluster_2
1 0 cluster_2
0 1 cluster_2
0 0 cluster_2

You can then fit the model including an interaction term:

fit <- glm_gp(pbmcs[, pbmcs$cell_type == "cluster 2"], 
                design = ~ treatmentA + treatmentB + treatmentA:treatmentB) 
# The column names tell you the identifiers that you can as contrasts
colnames(fit$Beta)
test_de(fit, contrast = `treatmentA:treatmentB`)
pchiang5 commented 3 years ago

Thank you, Constantin.