Open kokyriakidis opened 5 years ago
Hi Konstantinos,
Thanks for reaching out. This is in fact typical - certain cancer types take substantially longer to compute than others, especially on the "Finding nonparametric adjustments" step. If you are able to let the script sit, and have provided sufficient memory, it should eventually complete. If not, let me know and I can see if I still have the processed data files laying around from when I worked on this project.
Sorry for the hassle, solutions have been considered to reduce this unnecessary computation for the user in future versions of MethylMix.
Best,
Lucas
Hello Lucas,
If you gave the LIHC files, I would be grateful
Best, Konstantinos On Mon, 25 Feb 2019 at 21:57, Lucas Patel notifications@github.com wrote:
Hi Konstantinos,
Thanks for reaching out. This is in fact typical - certain cancer types take substantially longer to compute than others, especially on the "Finding nonparametric adjustments" step. If you are able to let the script sit, and have provided sufficient memory, it should eventually complete. If not, let me know and I can see if I still have the processed data files laying around from when I worked on this project.
Sorry for the hassle, solutions have been considered to reduce this unnecessary computation for the user in future versions of MethylMix.
Best,
Lucas
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gevaertlab/MethylMix/issues/2#issuecomment-467159574, or mute the thread https://github.com/notifications/unsubscribe-auth/AgKy3T3aWjJIsd6NncnF_7Z_t0LtBnnkks5vREACgaJpZM4bN3C0 .
Konstantinos,
I will keep you posted.
Best,
Lucas
Hello, Did you checked it? I noticed that the problem is that despite I use doParallel, methylmix uses only 1 core on my pc. I used cl <- makeCluster(20) registerDoParallel(cl) but it still uses 1 core
Hello, I am using the MethylMix package for Gastric Cancer and I am also facing the same problem. I am also using the doParallel package but it is still taking 1 core. Its have been more than 2 days and the job is still running.
Hi, I am experiencing the same problem, the script has been running for 66+ hours for one cancer type and I was planning to analyze multiple cancer types. Any workaround recommended?
Same issue here for Rectal Cancer.
cancerSite <- "READ" # TCGA Code for rectal cancer targetDirectory <- paste0(getwd(), "/") GetData(cancerSite, targetDirectory)
I tried running the above three lines and it's forever stuck at the 'Finding nonparametric adjustments' in my console
Will this issue be fixed at some point?
I ran this for 9 days straight, it did not complete. Please fix your package.
So regarding the parallelization of the non-parametric adjustments: I added a foreach loop in the ComBat_NoFiles.R. It starts a thread for each batch of your data. So 16 batches = 16 Threads. Probably a better way would be to parallelize the Monte Carlo implementation directly, but I am no expert so I'll leave this to someone else. If this find approval, something like it hopefully makes it into the bioconductor git.
final_temp<-foreach::foreach(i = 1:n.batch, .combine=rbind, .export = c("int.eprior")) %dopar% {
temp <- int.eprior(as.matrix(s.data[,batches[[i]]]),gamma.hat[i,],delta.hat[i,])
temp
}
# split the results from parallel computing
for(i in 1:nrow(final_temp)){
if(rownames(final_temp)[i]=="g.star"){
gamma.star <- rbind(gamma.star,final_temp[i,])
}else{
delta.star <- rbind(delta.star,final_temp[i,])
}
}
Alright so I think I figured a way to parallelize the Monte Carlo function. I think this should do the trick:
int.eprior <- function(sdat,g.hat,d.hat){
g.star <- d.star <- NULL
r <- nrow(sdat)
print(paste0(r,'\n'))
adj<-foreach::foreach(i = 1:r, .combine = cbind) %dopar% {
g <- g.hat[-i]
d <- d.hat[-i]
x <- sdat[i,!is.na(sdat[i,])]
n <- length(x)
j <- numeric(n)+1
dat <- matrix(as.numeric(x),length(g),n,byrow=T)
resid2 <- (dat-g)^2
sum2 <- resid2%*%j
LH <- 1/(2*pi*d)^(n/2)*exp(-sum2/(2*d))
LH[LH=="NaN"]=0
g.star <- sum(g*LH)/sum(LH)
d.star <- sum(d*LH)/sum(LH)
#if(i%%1000==0){print(paste0(i,'\n'))}
return(list(g.star,d.star))
}
adjust <- rbind(unlist(adj[1,]),unlist(adj[2,]))
rownames(adjust) <- c("g.star","d.star")
adjust
}
Hi, I am experiencing the same problem. Will this problem be solved?
Any update on this problem? It must have worked before since there are multiple publications using this package.
Alright so I think I figured a way to parallelize the Monte Carlo function. I think this should do the trick:
int.eprior <- function(sdat,g.hat,d.hat){ g.star <- d.star <- NULL r <- nrow(sdat) print(paste0(r,'\n')) adj<-foreach::foreach(i = 1:r, .combine = cbind) %dopar% { g <- g.hat[-i] d <- d.hat[-i] x <- sdat[i,!is.na(sdat[i,])] n <- length(x) j <- numeric(n)+1 dat <- matrix(as.numeric(x),length(g),n,byrow=T) resid2 <- (dat-g)^2 sum2 <- resid2%*%j LH <- 1/(2*pi*d)^(n/2)*exp(-sum2/(2*d)) LH[LH=="NaN"]=0 g.star <- sum(g*LH)/sum(LH) d.star <- sum(d*LH)/sum(LH) #if(i%%1000==0){print(paste0(i,'\n'))} return(list(g.star,d.star)) } adjust <- rbind(unlist(adj[1,]),unlist(adj[2,])) rownames(adjust) <- c("g.star","d.star") adjust }
hello I have tried this method. But an error comes out: Error in unserialize(socklist[[n]]): error reading from connection or Error in serialize(data, node$con, xdr = FALSE): error writing to connection
27k dataset was preprocessed, but this error comes when 450k dataset being preprocessed halfway.
I have tried 2-3 times >< PLEASEEEEEEE help me
Hello, methylmix is stuck at
for about 30 hours +. Is there any problem? My code is: