Closed Chris1221 closed 8 years ago
Reading in as a sqldf
does not work. Slower at least by a factor of 10.
for(k in 1:5){
if(k == 1){
f = file(paste0(path, "chr1_block_", i, "_perm_", j, "_k_", k, ".controls.gen"), h = F, sep = " "))
sqldf("select * from f", dbname = tempfile(), file.format = list(header = T, row.names = F)) -> gen
} else if(k != 1){
f = file(paste0(path, "chr1_block_", i, "_perm_", j, "_k_", k, ".controls.gen"), h = F, sep = " "))
sqldf("select * from f", dbname = tempfile(), file.format = list(header = T, row.names = F)) %>% data.table::merge(gen, ., by = "V1:V5") %>% cbind(gen, .) -> gen
}
}
Read lines was at least 100 times slower.
inputFile <- "../inst/extdata/toy.gen"
system.time({
con <- file(inputFile, open = "r")
out <- data.table(ID = 1:1000)
while (length(oneLine <- readLines(con, n = 1, warn = FALSE)) > 0) {
myVector <- (strsplit(oneLine, " "))
myVector <- as.vector(as.factor(unlist(myVector)))
foreach(row = 1:nrow(gen)) %:% foreach(i = seq(6,((length(myVector)-2)),by=3), .combine = c) %do% {
myVector <- gen[row,]
j <- i + 1
h <- i + 2
one <- myVector[i]
two <- myVector[j]
three <- myVector[h]
final <- NA
if (one > 0.9) {
final <- 0
} else if (two > 0.9) {
final <- 1
} else if (three > 0.9) {
final <- 2
} else {
final <- NA
}
final
}
out[, myVector[3] := vec, with = FALSE] -> out
message(paste0(ncol(out)))
}
The above was also slower when
library(doParallel)
makeCluster(8)
Then %dopar%
.
Reading lines one at a time with coRge::gen2R
was really bad.
Chaining rows together with %:%
was equally disastrous. Don't go down this path.
foreach
with .combine = 'rbind()'
and .combine = 'c'
was just insanely slow.
Issue #19 might do it
Attempts at speeding up the reading and converting of
.gen
files.This thread exists as a warning and reminder to myself of how truly awful I am at programming.