MahShaaban / pcr

Quality assessing, analyzing and testing the statistical significance of real-time quantitative PCR data
https://CRAN.R-project.org/package=pcr
GNU General Public License v3.0
27 stars 8 forks source link

Rsquared not calculated #23

Closed katefgit closed 1 year ago

katefgit commented 3 years ago

Hi @MahShaaban I started using this package and I am managed to use pcr_assess for "standard curve", but for all my assays the rsquared is NA. could you maybe help me with that?

MahShaaban commented 3 years ago

Hi @katefgit Thanks for reaching out.

The help page of pcr_assess contains examples, that might be helpful. Here is one of them.

# load library
library(pcr)

# locate and read file
fl <- system.file('extdata', 'ct3.csv', package = 'pcr')
ct3 <- read.csv(fl)

# make amount/dilution variable
amount <- rep(c(1, .5, .2, .1, .05, .02, .01), each = 3)

# calculate the standard curve
pcr_assess(ct3,
           amount = amount,
           method = 'standard_curve')
#>    gene intercept     slope r_squared
#> 1 c_myc  25.69669 -3.388095 0.9965504
#> 2 GAPDH  22.68221 -3.414551 0.9990278

Created on 2021-02-10 by the reprex package (v1.0.0)

Would you please try to provide a similar code that shows the problem? If you can't show the actual data, you can write a simple example to reproduce the issue.

Also, make sure you are working with the latest version of the pcr package.

katefgit commented 3 years ago

I am following up on that... I figured out that the problem is that I have assays/genes with varying number of points in the st.curve (e.g. 1 gene/assay has 15 points, another one 18, but I use as "amount" the max. e.g. amount<-rep(c(0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001), each = 3)

image

If I calculate it separately for each gene I get around it. Is there any "easy" way to fix it, when I have a df with many genes?

MahShaaban commented 3 years ago

I don't think there is an easy way to do that. The function treats every column individually and assumes that the amount variable is valid for all, which is not the case in your example

Neurarian commented 1 year ago

I ran into the same issue today, as my highest dilution for one gene was below the detection limit and therefore wasn't detected, resulting in NAs at that dilution. This isn't handled by the pcr_rsquared() function or, rather, the base cor() function. This minor modification to the pcr_rsquared() helper function could fix the issue by skipping the missing point(s) of the curve, not affecting curve calculation for the other genes/columns without NAs:

pcr_rsquared <- function(vec, var) {
  if(anyNA(vec)){
    warning(paste0(sum(is.na(vec)), sep = " ",
                   "NAs detected. Ensure samples are still within the dynamic range"))
  }  
  res <- cor(vec, log10(var), use = "complete.obs")^2
  return(res)
}

Using this as an example:

var = c(.1, .01, .001, .0001, .00001)
vec = c(25, 28.4, 31.6, 35, NA)

The modified function skips any dilution which is paired with NA by utilizing the use argument of the cor() function. I'd personally like to be informed about any NAs in this data. Therefore, it also warns the user that it did.

MahShaaban commented 1 year ago

Hi, @Neurarian. Your solution looks great. Would you like to open a pull request and modify pcr_rsquared as you suggest? I would be happy to review and merge the code. Otherwise, I can do it myself.

Neurarian commented 1 year ago

Hi, @MahShaaban. Thank you! I would be happy to make a contribution. I've just opened one.