How to calculate the R2 for targets that have different number of replicates?

CamillaMaciel commented 3 years ago

Hi Mahmoud, Thank you for this very useful package. Now I am trying to use for a bigger dataset where I have several Assays (genes) but the standard curves have different number of replicates, because I had to apply filters and some of them got removed. This way the 'amount' does not represent all Assays (genes) and therefore the R2 cannot be calculated. I tried to create an amount list for each one of the genes with the precise amount of replicates for each. I am not sure how to run a loop within the function 'pcr_assess' and get the results for all the Assays.

## Here an example file to mimic the issue:
df <- data.frame(Sample.Name = c(1E-01,1E-01,1E-01,1E-02,1E-02,1E-02,1E-03,1E-03,1E-03,1E-04,1E-04,1E-04,
  1E-05,1E-05,1E-05,1E-06,1E-06,1E-01,1E-01,1E-01,1E-02,1E-02,1E-02,1E-03,1E-03,1E-03,1E-04,1E-04,1E-04,1E-05,1E-05,
  1E-05,1E-06,1E-06,1E-01,1E-01,1E-01,1E-02,1E-02,1E-02,1E-03,1E-03,1E-03,1E-04,1E-04,1E-04,1E-05,1E-05,1E-05,1E-06,
  1E-06,1E-06), CT = c(13.43761,13.47637,13.43361,16.94466,16.95963,17.00617,20.36111,20.47487,20.50990,23.73218,
  23.80931,23.69554,27.26124,27.16503,27.28794,31.09467,31.74184,14.43796,14.41361,14.35675,17.92048,17.93056,
  17.94622,21.44288,21.41265,21.47235,24.84470,24.84674,24.84562,28.36251,27.98210,28.43939,31.89748,31.98399,
  13.84862,13.82174,13.80726,17.15890,17.13154,17.13531,20.58301,20.61107,20.55656,23.94169,23.91247,23.84745,
  27.34220,27.45825,27.21272,30.91330,30.64840,31.09571), Assay.Name = c("Assay01","Assay01","Assay01","Assay01",
"Assay01","Assay01","Assay01","Assay01","Assay01","Assay01","Assay01","Assay01","Assay01","Assay01","Assay01","Assay01","Assay01","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay02","Assay03","Assay03", "Assay03","Assay03","Assay03","Assay03","Assay03","Assay03","Assay03","Assay03","Assay03","Assay03","Assay03","Assay03",
  "Assay03","Assay03","Assay03","Assay03"))

## Format df to enter pcr_assess function - CT
subtest <- df %>% 
  group_by(Assay.Name) %>% 
  dplyr::mutate(grouped_id = row_number()) %>% 
  spread(Assay.Name, CT) %>% 
  select(-grouped_id, -Sample.Name) %>% 
  arrange(Assay01)   

## Make a vector of DNA  amounts  
amount <- df[c(1,3)] %>% 
  group_by(Assay.Name) %>% 
  dplyr::mutate(grouped_id = row_number()) %>%  
  spread(Assay.Name, Sample.Name) %>% 
  select(-grouped_id) %>% 
  rename_all( ~ paste0("Amount_", .x))

## Retain curve information: calculate intercept, slope, r_squared and efficiency
  res <- pcr_assess(subtest[3],amount = Amount_Assay03,method = 'standard_curve')
  names(res)[1] <- "Assay"
  res$efficiency <- NA   
  res$efficiency <- (10^(-1/res$slope)-1)*100

MahShaaban commented 3 years ago

Hi @CamillaMaciel I don't think you need to do much beyond the looping, since the calculation are applied for each assay (gene) separately. How about splitting the original data df into smaller subsets based on Assay.Name and use map to call pcr_assess

# load libraries
library(pcr)
library(tidyverse)

# apply pcr_assess on each assay separately
df %>%
  group_split(Assay.Name) %>%
  setNames(unique(df$Assay.Name)) %>%
  map(~{
    pcr_assess(select(.x, CT),                  # use select to return a data.frame
               amount = pull(.x, Sample.Name))  # use pull to return a vector
  })
#> $Assay01
#>   gene intercept     slope r_squared
#> 1   CT  9.864613 -3.520934 0.9983931
#> 
#> $Assay02
#>   gene intercept     slope r_squared
#> 1   CT  10.94083 -3.483191 0.9996576
#> 
#> $Assay03
#>   gene intercept    slope r_squared
#> 1   CT  10.35883 -3.40583 0.9996625

^{Created on 2021-07-22 by the reprex package (v2.0.0)}

Please let me know if you have other questions

CamillaMaciel commented 3 years ago

Hi Mahmoud. Thank you so much for your reply. It works super fine :)

I also added these lines for calculating the efficiency:

# apply pcr_assess on each assay separately
res<- df %>%
  group_split(Assay.Name) %>%
  setNames(unique(std_bQC$Assay.Name)) %>%
  map(~{
    pcr_assess(select(.x, CT),                  # use select to return a data.frame
               amount = pull(.x, Sample.Name))  # use pull to return a vector
  })

# Calculate efficiency
df2<- lapply(df, function(x) 
  cbind(x, efficiency = (10^(-1/x$slope)-1)*100))

MahShaaban commented 3 years ago

I am glad it worked. There is a way to calculate the efficiency in the pcr package. You only need to add method == 'efficiency' along with the name of the reference_gene in pcr_assess.

MahShaaban / pcr

How to calculate the R2 for targets that have different number of replicates? #26