ScienceParkStudyGroup / studyGroup

Gather together a group to skill-share, co-work, and create community
https://www.scienceparkstudygroup.info
Other
6 stars 12 forks source link

Make my qPCR data (.csv) 'tidy' ? #13

Closed JihedC closed 6 years ago

JihedC commented 6 years ago

Hello everyone,

I would like to make a script in R in order to produces the qPCR analysis, I've found some script on bioconductor but they are way too complex for what I want to do. Here I am stocked already at the beginning :P

From the 7500 machine, I get a csv document

df_raw<-read.csv(file = ..., skip = 27, header = TRUE) #skip 27 row to start reading the data for this machine
df_raw$Ct <- as.numeric(levels(df_raw$Ct))[as.integer(df_raw$Ct)] #Change 'Ct' column from factor to numeric properly
df_raw[,7:15] <-list(NULL) # Remove the unused columns
head(df_raw)

screen shot 2017-11-22 at 16 40 21

I have 4 different primers pairs here (each 24 wells), the detector variable can be used to identify the primers Detector == '1' is the 1st primer pair, Detector == '10' is second primer pair, ...

I am looking for a method to transform the data frame, in order to have the different primers as variable for the different samples (don't know if I am clear here ...).

For now I tried to filter into a new data frame using the detector value, for each primer.

EF1A <- df_raw %>%
  filter(Detector == '10')
  glimpse(EF1A)

Then to combine by column the 4 data frames, so as expected I am getting a big data frame of 24 observation and 32 variables, with most of the column being copied of one another and useless. I think it's wrong and can not be adapted automatically to different primers.

I would like to use the function spread() from dplyr but I don't really understand it.

If anyone has any suggestion to help me with this first step, I would be really happy.

Thanks,

Jihed

ajongbloets commented 6 years ago

Hi Jihed,

Thanks for posting this question to the studygroup!

So what I understand from your post is that want to make a new data.frame with one variable per primer. So each observation will then have a variable with the Ct (?) value for primer pair 1, a variable for primer pair 2 etc..

Using spread to do this with a simplified example:

A <- data.frame( Name = c("Kees", "Leaf", "Kees", "Leaf"), Detector = c(1, 1, 10, 10), Ct = c(1,2,3,4))
spread(A, Detector, Ct)

Will generate a data.frame of 2 observations and 3 variables (Name, 1 and 10). The first argument is the data.frame you want to transform, the second argument is the variable you want to use for the column names (key) and the third argument sets the variable used to take the values from. Note that spread only works if there is a one-to-many relationship between the unique values in the untouched columns and the unique values in the keys and not if there are many-to-many relationships.

So, the following will not work:

A <- A <- data.frame( Name = c("Kees", "Kees", "Leaf", "Leaf", "Kees", "Leaf"), Detector = c(1, 1, 1, 1, 10, 10), Ct = c(1,2,3,4, 5, 6))
spread(A, Detector, Ct)

As this is the case with the data provide, you cannot use spread, because there is no way to know which "Ct" value of primer pair 1 belongs to which observation with the name "Kees" (there are multiple observations for "Kees" with primer pair 1).

Also in most cases R prefers a data.frame in the long-format (as you already have it) and not in wide-format (what you are trying to create), so maybe you can explain us what you want to do after this step? So we can advise you on the appropriate steps to get to your goal.

I hope this helps you a bit.

Joeri

JihedC commented 6 years ago

Hi Joeri,

Thanks for such a quick reply! I just figure out after posting that spread() is not going to work that easily.

I have 12 samples of cDNA on a 96 wells plate for qPCR. For example the sample 'Kees' have 2 technical replicates for each set of primers, so in this first qPCR 'Kees' sample is found 8 times. They are the same sample but treated with different primers.

My goal here would be to :

I would like to do this because I will have many more samples and replication of the sample(observations) to make, and also many more gene to test. It would be much easier if I can get a script that automatically make the calculation and the plot.

There are a several script already made in R for this but with my basic level I don't really know if they can be adapted to the type of data we get from our machine, nor how to make it work.

My idea was to change the data in tidy format so calculation can easily be done, but I might be wrong. Any input would be a great help and if you want I can show you what I get when I spread the data.

Thanks for your help,

Jihed

mgalland commented 6 years ago

Hey Jihed, I hope I understand all your problems correctly. For me, you have multiple problems in one here. To be able to go from qPCR results to plot, you need to do:

  1. You need to average technical replicates (=same biological sample, same primer pair).
  2. Then, you need to compute the 2^(DeltaTargetGene -DeltaReferenceGene) for each sample and each target gene (reference gene is common for all genes).
  3. Then you need to compute the average and sd for each gene

I would first try to solve the technical replicate problem. This will give you ideas and hints for the rest. Make a dummy data frame with only one primer pair first. For instance:

Well Sample Detector Task Ct Std
A1 Sample1 GeneA Unknown 25.6 0.2
A2 Sample2 GeneA Unknown 26.6 0.2
A3 Sample1 GeneB Unknown 29.9 0.2
A4 Sample2 GeneB Unknown 31.0 0.2

Average the technical replicates for geneA and for geneB (using dplyr).

If you can do that, then the rest should be easier. You'll need to add extra columns to identify your reference gene (add a column with "target" and "reference" in there).

Let me know how it goes, Marc Ps: thanks for posting this issue here!