LTLA / scuttle

Clone of the Bioconductor repository for the scuttle package.
https://bioconductor.org/packages/devel/bioc/html/scuttle.html
9 stars 7 forks source link

How to recover gene expression used for scran::pairwiseTTests() #16

Open ewijaya opened 2 years ago

ewijaya commented 2 years ago

Hi Aaron,

Thank you for making this wonderful tool.

This is not an issue but a question for scran::pairwiseTTests(). And I also know this is a scuttle repository, but I can't find one match for scran. So I posted here.

I'm wondering if there's a way we can recover the mean gene expression used for calculating the fold change of scran::pairwiseTTests.

I tried the following code used for calculating fold change using scran::pairwiseTTests:

library(scuttle)
library(scran)
library(tidyverse)
sce <- mockSCE()
sce <- logNormCounts(sce)

cell_groupings <- colData(sce)$Mutation_Status 
names(cell_groupings) <- rownames(sce)

out <- pairwiseTTests(scuttle::logNormCounts(sce), groups=cell_groupings)
out$statistics[[1]] %>% # negative / positive
  as.data.frame() %>%
  rownames_to_column(var = "genes") %>%
  as_tibble() %>%
  arrange(desc(logFC)) 

It produces

genes     logFC  p.value   FDR
   <chr>     <dbl>    <dbl> <dbl>
 1 Gene_1740 1.20  0.000617 0.733
 2 Gene_0691 0.968 0.00435  0.733
 3 Gene_1755 0.932 0.00690  0.733
 4 Gene_1460 0.916 0.0107   0.733
 5 Gene_0894 0.890 0.0129   0.733
 6 Gene_1910 0.847 0.0208   0.761
 7 Gene_1839 0.844 0.0168   0.733
 8 Gene_1147 0.839 0.00304  0.733
 9 Gene_0477 0.832 0.0134   0.733
10 Gene_1201 0.816 0.00643  0.733

Notice that the logFC for Gene_1740 is 1.20 and Gene_0691 is 0.968.

But when I compute manually to get mean expression:

meta_data_df <- colData(sce) %>% 
  as.matrix() %>% 
  as.data.frame()   %>% 
  rownames_to_column(var = "barcodes") %>% 
  as_tibble()

scuttle::normalizeCounts(sce) %>% 
as.matrix() %>% 
as.data.frame() %>% 
rownames_to_column(var = "genes") %>% 
as_tibble() %>% 
pivot_longer(-genes, names_to = "barcodes", values_to = "gexp") %>% 
  left_join(meta_data_df, by = "barcodes") %>% 
  filter(genes %in% c("Gene_1740", "Gene_0691")) %>%
  group_by(genes, Mutation_Status) %>% 
  summarise(mean_gexp = mean(gexp)) %>% 
  pivot_wider(names_from = "Mutation_Status", values_from = "mean_gexp") %>% 
  mutate(logFC = log2(negative/positive)) %>% 
  arrange(desc(logFC))

I get

genes     negative positive logFC
  <chr>        <dbl>    <dbl> <dbl>
1 Gene_1740     4.74     3.54 0.421
2 Gene_0691     4.28     3.32 0.370

Notice the difference in logFC. Is there a way I can easily recover the actual expression so that the logFC match calculated with scran::pairwiseTTests?

Thanks and hope to hear from you again.

Edward

LTLA commented 2 years ago

I don't follow any of the pipes, but I will say that normalizeCounts will produce, by default, log2-normalized values. So your log-fold change calculation should not be log2(negative/positive), but instead, negative - positive, assuming that both negative and positive represent the mean log-expression value in their respective groups.