Closed dmsoanes closed 1 month ago
Another peculiar thing:
> str(QCmetrics)
'data.frame': 96 obs. of 61 variables:
$ Plate : int 6 6 6 6 6 6 6 6 6 6 ...
$ Chip_ID : int 61 61 61 61 61 61 61 61 62 62 ...
$ Chip_Location : chr "A01" "B01" "C01" "D01" ...
$ Individual_ID : int 109131 109131 109131 109521 109521 109521 105512 105512 107911 107911 ...
$ family : int 10913 10913 10913 10952 10952 10952 10551 10551 10791 10791 ...
$ Sex : chr "M" "M" "M" "F" ...
And yet:
if(class(QCmetrics[,item]) != "factor"){
print(paste("Converting technical variable", item, "to factor."))
tmpVar<-as.factor(QCmetrics[,item])
}
Doesn't appear to be running either (as I can't find evidence of this message in any log/error file in this directory)
Regardless I believe the error is this:
The "Plate" column has length(levels())
of 1. so in the following logic:
if(item %in% colnames(QCmetrics)){
QCmetrics[which(QCmetrics[,item] == ""),item]<-NA
if(class(QCmetrics[,item]) != "factor"){
print(paste("Converting technical variable", item, "to factor."))
tmpVar<-as.factor(QCmetrics[,item])
} else {
tmpVar<-QCmetrics[,item]
}
numVar<-length(levels(tmpVar))
if(numVar == nrow(QCmetrics)){
uniqueVar<-c(uniqueVar, item)
} else {
if(sum(is.na(tmpVar)) == length(tmpVar)){
missingVar<-c(missingVar,item)
} else {
colVar<-rainbow(numVar)[tmpVar] ### convert to colours for plotting purposes
plotCols<-data.frame(plotCols, colVar, stringsAsFactors = FALSE)
legendParams[[item]]<-cbind(levels(tmpVar),rainbow(numVar))
}
}
} else {
print(paste("Column", item, "not found. Please check config file.")) ### change to hard stop
#quit(save = "no")
}
Neither missingVar
nor uniqueVar
are set, and so "Plate" is added to plotCols
and is kept in techVar
as a result. This is not the intended behaviour. If the length(levels())
of an item
is 1, then contrasts will fail as it has the condition:
! contrasts can be applied only to factors with 2 or more levels
Solution:
Add a check for the length(levels(tmpVar)) == 1
So if I removed Plate from the list of technical variables in the config file, would that also solve the problem?
Provided that there are no other columns that have length(levels())
equal to one yes.
EDIT: I've checked and this is not the case
> for (col in colnames(QCmetrics)) print(paste(col, length(levels(as.factor(QCmetrics[[col]])))))
[1] "Plate 1"
[1] "Chip_ID 12"
[1] "Chip_Location 96"
[1] "Individual_ID 34"
[1] "family 34"
[1] "Sex 2"
[1] "Age 96"
[1] "Sample_Collection_Date 95"
[1] "Sample_Concentration_Status 2"
[1] "GPSR.isolation.ID 96"
[1] "Sentrix_Position 8"
[1] "Sentrix_ID 12"
[1] "Basename 96"
[1] "M.median 94"
[1] "U.median 95"
[1] "intens.ratio 96"
[1] "intensPASS 1"
[1] "bisulfCon 96"
[1] "PC1_cp 96"
[1] "PC2_cp 96"
[1] "PC3_cp 96"
[1] "PC4_cp 96"
[1] "PC1_betas 96"
[1] "PC2_betas 96"
[1] "PC3_betas 96"
[1] "PC4_betas 96"
[1] "PC5_betas 96"
[1] "PC6_betas 96"
[1] "PC7_betas 96"
[1] "PC8_betas 96"
[1] "PC9_betas 96"
[1] "PC10_betas 96"
[1] "PC11_betas 96"
[1] "PC12_betas 96"
[1] "PC13_betas 96"
[1] "PC14_betas 96"
[1] "PC15_betas 96"
[1] "PC16_betas 96"
[1] "PC17_betas 96"
[1] "PC18_betas 96"
[1] "PC19_betas 96"
[1] "PC20_betas 96"
[1] "PC21_betas 96"
[1] "PC22_betas 96"
[1] "PC23_betas 96"
[1] "PC24_betas 96"
[1] "pFilter 1"
[1] "DNAmAge 96"
[1] "horvath.n_missing 1"
[1] "CCDNAmAge 96"
[1] "rmsd 96"
[1] "sdd 96"
[1] "sadd 96"
[1] "srms 96"
[1] "nNAs 22"
[1] "nNAsPer 22"
[1] "x.cp 96"
[1] "y.cp 96"
[1] "predSex.x 2"
[1] "predSex.y 2"
[1] "predSex 2"
Contact Details
d.m.soanes@exeter.ac.uk
What happened?
Microarray data was not cell sorted The first part of the QC pipeline seemed to work fine: 1: Column names were correct 2: raw.gds and rawNorm.gds files were produced 3: calcQCMetrics.r worked (files attached) - although there was a warning: Sentrix position model failed, skipping not sure if that is important 4: The markdown script QC.rmd failed:
How can the bug be reproduced?
Uploaded config, output, log and sample sheet files - other files can be found at /lustre/projects/Research_Project-MRC190311/DNAm/Darren/OHSU_006PLT_JN config.r.txt config.txt PassQCStatusAllSamples.csv QCDNAdata_2558375.err.txt QCDNAdata_2558375.log QCMetricsPostSampleCheck.csv sampleSheet.csv
Relevant log output