Closed dmsoanes closed 1 month ago
Initial notes:
code section:
passQC<-QCSum$Basename[as.logical(QCSum[,"passQCS2"])]
snpCor<-snpCor[passQC, passQC]
if(max(snpCor, na.rm = TRUE) > 0.8){
# pull out samples that are genetically identical
predictedduplicates<-vector(length = ncol(snpCor))
for(i in 1:ncol(snpCor)){
predictedduplicates[i]<-paste(sort(names(which(snpCor[,i] > 0.8))), sep = "|", collapse = "|")
}
} else {
predictedduplicates<-colnames(snpCor)
}
QCmetrics$predictedduplicates<-predictedduplicates
The max(snpCor)
for Darren's data gives 1, so the if statement is running.
The number of rows in predictedduplicates
is not coming out as the same number of rows as in QCmetrics
. I have checked and snpCor
has the same number of column as QCmetrics
initially.
Problem lies here:
snpCor<-snpCor[passQC, passQC]
This bit here can make it so that snpCor
has less columns. Causing a there to be too few columns in snpCor
(now less than there are rows in QCmetrics
).
This is covered for in the if statement with
predictedduplicates<-vector(length = ncol(snpCor))
But unfortunately, this line comes too late. The number of columns in snpCor
has already changed by this point.
I do not know for sure what the intention is behind this block. My guess is this:
You want to filter
snpCor
to only those that pass the QC. Then you want to use these to get predicted duplicates (those with a very high correlation). For all other rows, you would just want an empty string as there are no predicted duplicates.
If this is the case. Do we want there to be NA
s (or something similar) in predictedduplicates
for the rows that did not pass QC? Or do we want to subset QCmetrics on passQC
as well to get these variables to have the same length?
Maybe something else?
Also this if statement:
if(max(snpCor, na.rm = TRUE) > 0.8){
...
}
will always run as snpCor
's diagonal is 1s. This conditional logic can be removed no?
Contact Details
d.m.soanes@exeter.ac.uk
What happened?
Microarray data was not cell sorted The first part of the QC pipeline seemed to work fine: 1: Column names were correct 2: raw.gds and rawNorm.gds files were produced 3: calcQCMetrics.r worked (files attached) - although there was a warning: Sentrix position model failed, skipping not sure if that is important 4: The markdown script QC.rmd failed - SexMismatches.csv and PlotsWithinDNAmMismatches.pdf files produced but contained no data.
How can the bug be reproduced?
Uploaded config, output, log and sample sheet files - other files can be found at /lustre/projects/Research_Project-MRC190311/DNAm/Darren/OHSU_006PLT_JN config.r.txt config.txt PassQCStatusAllSamples.csv QCDNAdata_2559367.err.txt QCDNAdata_2559367.log QCMetricsPostSampleCheck.csv
Relevant log output