MahShaaban / pcr

Quality assessing, analyzing and testing the statistical significance of real-time quantitative PCR data
https://CRAN.R-project.org/package=pcr
GNU General Public License v3.0
27 stars 8 forks source link

How to calculate the R2 for targets that have different number of replicates? #26

Closed CamillaMaciel closed 3 years ago

CamillaMaciel commented 3 years ago

Hi Mahmoud, Thank you for this very useful package. Now I am trying to use for a bigger dataset where I have several Assays (genes) but the standard curves have different number of replicates, because I had to apply filters and some of them got removed. This way the 'amount' does not represent all Assays (genes) and therefore the R2 cannot be calculated. I tried to create an amount list for each one of the genes with the precise amount of replicates for each. I am not sure how to run a loop within the function 'pcr_assess' and get the results for all the Assays.

## Here an example file to mimic the issue:
df <- data.frame(Sample.Name = c(1E-01,1E-01,1E-01,1E-02,1E-02,1E-02,1E-03,1E-03,1E-03,1E-04,1E-04,1E-04,
  1E-05,1E-05,1E-05,1E-06,1E-06,1E-01,1E-01,1E-01,1E-02,1E-02,1E-02,1E-03,1E-03,1E-03,1E-04,1E-04,1E-04,1E-05,1E-05,
  1E-05,1E-06,1E-06,1E-01,1E-01,1E-01,1E-02,1E-02,1E-02,1E-03,1E-03,1E-03,1E-04,1E-04,1E-04,1E-05,1E-05,1E-05,1E-06,
  1E-06,1E-06), CT = c(13.43761,13.47637,13.43361,16.94466,16.95963,17.00617,20.36111,20.47487,20.50990,23.73218,
  23.80931,23.69554,27.26124,27.16503,27.28794,31.09467,31.74184,14.43796,14.41361,14.35675,17.92048,17.93056,
  17.94622,21.44288,21.41265,21.47235,24.84470,24.84674,24.84562,28.36251,27.98210,28.43939,31.89748,31.98399,
  13.84862,13.82174,13.80726,17.15890,17.13154,17.13531,20.58301,20.61107,20.55656,23.94169,23.91247,23.84745,
  27.34220,27.45825,27.21272,30.91330,30.64840,31.09571), Assay.Name = c("Assay01","Assay01","Assay01","Assay01",
"Assay01","Assay01","Assay01","Assay01","Assay01","Assay01","Assay01","Assay01","Assay01","Assay01","Assay01","Assay01","Assay01","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay03","Assay03", "Assay03","Assay03","Assay03","Assay03","Assay03","Assay03","Assay03","Assay03","Assay03","Assay03","Assay03","Assay03",
  "Assay03","Assay03","Assay03","Assay03"))

## Format df to enter pcr_assess function - CT
subtest <- df %>% 
  group_by(Assay.Name) %>% 
  dplyr::mutate(grouped_id = row_number()) %>% 
  spread(Assay.Name, CT) %>% 
  select(-grouped_id, -Sample.Name) %>% 
  arrange(Assay01)   

## Make a vector of DNA  amounts  
amount <- df[c(1,3)] %>% 
  group_by(Assay.Name) %>% 
  dplyr::mutate(grouped_id = row_number()) %>%  
  spread(Assay.Name, Sample.Name) %>% 
  select(-grouped_id) %>% 
  rename_all( ~ paste0("Amount_", .x))

## Retain curve information: calculate intercept, slope, r_squared and efficiency
  res <- pcr_assess(subtest[3],amount = Amount_Assay03,method = 'standard_curve')
  names(res)[1] <- "Assay"
  res$efficiency <- NA   
  res$efficiency <- (10^(-1/res$slope)-1)*100
MahShaaban commented 3 years ago

Hi @CamillaMaciel I don't think you need to do much beyond the looping, since the calculation are applied for each assay (gene) separately. How about splitting the original data df into smaller subsets based on Assay.Name and use map to call pcr_assess

# load libraries
library(pcr)
library(tidyverse)

# apply pcr_assess on each assay separately
df %>%
  group_split(Assay.Name) %>%
  setNames(unique(df$Assay.Name)) %>%
  map(~{
    pcr_assess(select(.x, CT),                  # use select to return a data.frame
               amount = pull(.x, Sample.Name))  # use pull to return a vector
  })
#> $Assay01
#>   gene intercept     slope r_squared
#> 1   CT  9.864613 -3.520934 0.9983931
#> 
#> $Assay02
#>   gene intercept     slope r_squared
#> 1   CT  10.94083 -3.483191 0.9996576
#> 
#> $Assay03
#>   gene intercept    slope r_squared
#> 1   CT  10.35883 -3.40583 0.9996625

Created on 2021-07-22 by the reprex package (v2.0.0)

Please let me know if you have other questions

CamillaMaciel commented 3 years ago

Hi Mahmoud. Thank you so much for your reply. It works super fine :)

I also added these lines for calculating the efficiency:

# apply pcr_assess on each assay separately
res<- df %>%
  group_split(Assay.Name) %>%
  setNames(unique(std_bQC$Assay.Name)) %>%
  map(~{
    pcr_assess(select(.x, CT),                  # use select to return a data.frame
               amount = pull(.x, Sample.Name))  # use pull to return a vector
  })

# Calculate efficiency
df2<- lapply(df, function(x) 
  cbind(x, efficiency = (10^(-1/x$slope)-1)*100))
MahShaaban commented 3 years ago

I am glad it worked. There is a way to calculate the efficiency in the pcr package. You only need to add method == 'efficiency' along with the name of the reference_gene in pcr_assess.