Method of normalisation for pcr_test?

davos-i commented 6 years ago

Hi, Thanks for the great package, I've found it very useful. I am new to RStudio and new to real-time PCR, so I have been trying to understand the maths behind your package. I am researching gene expression in animal tissue based on different dietary treatments. I have a standard curve for every run of my qPCR. Therefore, I have used the relative_curve method in the function pcr_analyze. From what I understand, if I use the "relative_curve" method it will normalize the data to a reference gene using "divide" (if separate tubes). However, if I then want to use pcr_test to run a linear model, it will use the same data but normalize using the "subtract" method as default. I don't know how to make changes as you mention in your contributing guidelines, but I have a working example to show you that the statistical results are different when I ran the pcr_test with a "divide" normalization. Can you please confirm with method is correct?

Other issues I've made work-arounds for in this example:

fixed errors bars for when there are 2 genes side by side
Because there are 3 treatment effects, the pcr_test does not do all comparisons, therefore it needs to be run twice with 2 "reference_genes" to get the third comparison. This example will do this and pull all unique comparisons into a new table.

Thanks

library(pcr)
library(dplyr)
library(ggplot2)
library(purrr)

#test data
df <- data.frame(control = c(16.14,16.01,15.99,15.99,16.18,16.42,16.25,15.66,15.86,16.14,16.16,16.18,16.39,16.37,16.42),
                 GOI_1 = c(24.5,24.5,23.1,23.9,23.8,23.1,23,23.1,23.1,22.8,22.7,23.9,24.1,24,23.4),
                 GOI_2 = c(15.86,22.44,19.81,21.97,19.54,16.29,21.56,21.97,13.71,18.58,20.36,17.89,22.64,18.22,21.43))

group_var1 <- c("Diet_1","Diet_1","Diet_1","Diet_1","Diet_1","Diet_2","Diet_2","Diet_2","Diet_2","Diet_2","Diet_3","Diet_3","Diet_3","Diet_3","Diet_3")

#intercept and slope:
intercept <- c(26.86229, 24.76040, 32.02007)

slope <- c(-3.661974, -3.357405, -3.250362)

#Analysis

plot <- pcr_analyze(df,
                    group_var = group_var1,
                    reference_gene = 'control',
                    reference_group = 'Diet_1',
                    intercept = intercept,
                    slope = slope,
                    method = 'relative_curve',
                    plot=T)

#to fix error bars:
plot$layers[[2]] <- NULL
plot <-
  plot + geom_errorbar(
    aes_string(ymin = "lower", ymax = "upper"),
    size = .3,
    width = .25,
    position = position_dodge(.9)
  )

plot

#To statistically test results using lm
#default method:
Diet_1_default <- pcr_test(
  df,
  group_var = group_var1,
  reference_gene = 'control',
  reference_group = 'Diet_1',
  test = 'lm'
) %>% data.frame(ComparisonGroup = "Diet_1")

Diet_3_default <- pcr_test(
  df,
  group_var = group_var1,
  reference_gene = 'control',
  reference_group = 'Diet_3',
  test = 'lm'
) %>% data.frame(ComparisonGroup = "Diet_3")

#combine results and display only unique comparisons (matched on p_value)
Combined <- rbind(Diet_1_default, Diet_3_default) %>% format(digits = 3)
Combined <- Combined[!duplicated(Combined$p_value), ]

stats_table_default <- Combined[order(Combined$gene, Combined$ComparisonGroup),] %>% as.data.frame(row.names = seq(1,length(Combined$gene)))

###Method modification - create new functions and re-run statistics using the normalisation = "divide"

pcr_lm_2 <-
  function (df,
            group_var,
            reference_gene,
            reference_group,
            model_matrix = NULL,
            ...)
  {
    norm <-
      pcr:::.pcr_normalize(df, reference_gene = reference_gene, mode = "divide") #added mode = "divide", as default for .pcr_normalize is mode = "substract"
    if (is.null(model_matrix)) {
      group_var <- relevel(factor(group_var), ref = reference_group)
    }
    tst <- map(norm, function(x) {
      if (is.null(model_matrix)) {
        linear_model <- lm(x ~ group_var, ...)
        conf_int <- confint(linear_model, ...)
      }
      else {
        linear_model <- lm(x ~ model_matrix + 0, ...)
        conf_int <- confint(linear_model, ...)
      }
      data_frame(
        term = names(linear_model$coefficients)[-1],
        estimate = unname(linear_model$coefficients)[-1],
        p_value = summary(linear_model)$coefficients[-1,
                                                     4],
        lower = conf_int[-1, 1],
        upper = conf_int[-1,
                         2]
      )
    })
    tst <- bind_rows(tst, .id = "gene")
    return(tst)
  }

#modify pcr_test
pcr_test_2 <- function (df, test = "t.test", ...)
{
  switch(
    test,
    t.test = pcr_ttest(df, ...),
    wilcox.test = pcr_wilcox(df, ...),
    lm = pcr_lm_2(df, ...)
  )
}

#run modified method:
Diet_1_modified <- pcr_test_2(
  df,
  group_var = group_var1,
  reference_gene = 'control',
  reference_group = 'Diet_1',
  test = 'lm'
) %>% data.frame(ComparisonGroup = "Diet_1")

Diet_3_modified <- pcr_test_2(
  df,
  group_var = group_var1,
  reference_gene = 'control',
  reference_group = 'Diet_3',
  test = 'lm'
) %>% data.frame(ComparisonGroup = "Diet_3")

#combine results and display only unique comparisons (matched on p_value)
Combined_2 <- rbind(Diet_1_modified, Diet_3_modified) %>% format(digits = 3)
Combined_2 <- Combined_2[!duplicated(Combined_2$p_value), ]

stats_table_modified <- Combined_2[order(Combined_2$gene, Combined_2$ComparisonGroup), ] %>% as.data.frame(row.names = seq(1, length(Combined_2$gene)))

print(stats_table_default)
print(stats_table_modified)

MahShaaban commented 6 years ago

Hi @davos-i

Thank you for opening this issue. I just made some changes to fix two of the issues you raised.

Now pcr_lm() takes an argument mode which you can set to "subtract" (default) or "divide". This will be passed to the internal normalization function.
I added position = 'dodge' to the geom_errorbar() function when the input data.frame has more than one gene. So the plot = TRUE should work fine now.
As for your question about the comparisons in pcr_test. I actually use the base R lm to run the model and this is its default behavior. It compares all the groups to the control. I think if you want the pairwise comparisons, then you need to run ttest individually for the comparisons you are interested in. You can, however, specify a contrast which will be passed to lm. Alternatively, you can construct a model.matrix and pass it to pcr_test and used to run the model.

davos-i commented 6 years ago

Hi @MahShaaban Thank you for your help, that will be great.

MahShaaban / pcr

Method of normalisation for pcr_test? #5