Non-Contradiction / JuliaCall

Embed Julia in R
https://non-contradiction.github.io/JuliaCall/index.html
Other
267 stars 36 forks source link

Computations are slowing down as iteration goes on. #139

Open mberat opened 4 years ago

mberat commented 4 years ago

Hi,

I'm running a series of computations using R and Julia. My setup is such that I modify/play with data in R and I use JuliaCall to run an optimization model in Julia, using JuMP. The main body of my code is a for loop and the main problem is that as I iterate through the for loop, the part where I run optimization (Things happening in Julia) is getting slower. Do you have any idea why that might be happening?

Thanks for any help, Best

Non-Contradiction commented 4 years ago

Thank you for the feedback! There can be multiple reasons for the issue. For example, if the memory usage keeps going up, then memory leak might be the issue, and it can happen either at level of JuliaCall, JuMP, or the specific solver used in JuMP. There used to be a JuliaCall memory leak issue long times ago... One thing might be helpful is to run the whole thing in Julia to see if the problem still exists, or try to use different solvers for JuMP to see if the issue is related to one specific solver.

mberat commented 4 years ago

Hi thanks for the reply! I think, the reason is related to a memory leak. Is there a solution for the memory leak problem you are referring to in JuliaCall?

Non-Contradiction commented 4 years ago

That problem I referred to should already be fixed. But maybe it still exist for some corner cases. So I think one of the first things we can do is to try to get some minimal working example to see where the problem really is. But before this, let us check some particular things to pay attention to.

One thing to pay attention to is that when solving large optimization model multiple times, we need to make sure the memory are cleaned up. So if the previous models are of no use, make sure they cannot be referred to on the R side anymore, otherwise JuliaCall will tell Julia to always keep them. To make things more clear, functions like julia_eval or julia_call will sometimes return a JuliaObject, which refers to a Julia object on the Julia side. And if the JuliaObject persists on the R side, then Julia will always keep the actual Julia object it refers to. This can cause something like memory leak but it is actually not. If you only use julia_command or julia_assign (both have no return value) or if you only use julia_eval and julia_call to get simple numerical scalars and vectors/matrices back into R, then you don't need to worry this problem and there is no way for JuliaCall to cause memory leak.

I also do a little searching on JuMP side and found some possible related issue like https://github.com/jump-dev/JuMP.jl/issues/1494 .

If these solutions don't solve the problem, then we need to spend more time to deal with the problem. I think we can 1. Try to replicate things on the Julia side to see if the problem persists. If so, then this is probably an issue for JuMP or a particular solver; or 2. Try to use different solvers in JuMP to see if the problem persists. If the problem only appears for a specific solver, then I guess the problem is with the particular solver instead of JuliaCall and JuMP.

mberat commented 4 years ago

Thanks again for the quick reply. I will be running a working example in Julia and see if it persists there. As for the contents of the code. Indeed, I'm using julia_eval to receive the output of the optimization into R. But as you mentioned I'm using it to get a numerical scalar back into R. Not sure if this is causing the real trouble. Anyways, I'm working on the running example for now, I shall post it here once I'm done. Also, for the solver side, I'm using Gurobi Solver, which is a quite famous and well-established commercial solver.

mberat commented 4 years ago

Ok, here is the reproducible example I can share that I created in R. The resulting plot of how much the computation takes is as follows: image As you can see as the iteration goes on, some runs are taking large amount of time compared to beginning of the iterations.

The main body of the code is as follows: library(data.table) library(geosphere) library(Rfast) library(JuliaCall) julia_command("using JuMP") julia_command("using Gurobi")

Made-up Input Data Preparation samSize<- 10000 set.seed(1) disturbance1_norm<-(trunc(rnorm(samSize)*500))/10^4 disturbance2_norm<-(trunc(rnorm(samSize)*500))/10^4 time_LL<-(trunc(runif(samSize)*1440)) lats<- 40.444851 + disturbance1_norm longs<- -79.953359 + disturbance2_norm dt_Large<-data.table(lat=lats,long=longs,time=time_LL)

input2<-data.table(ID=1:12,lat=runif(12,max = 40.47,min=40.37),long=runif(12,max = -79.89,min=-80.04))

numOfExperiments<-1000; time_dt<-data.table(expNum=rep(0,numOfExperiments),timeSpent=rep(0,numOfExperiments)) for(i in 1:numOfExperiments){ Preparing the input data set.seed(i); subSaple<-sample(1:10000,100); dt_S<-dt_Large[subSaple,]; setkey(dt_S,time) p_ij_30_9<-p_Late(dt_S,input2) t_Mat<-p_Time(dt_S,input2) o_iij<-o_iij_calc(dt_S,input2,t_Mat) The important function start_time <- Sys.time(); opt_out<-OptimalDispatchOffline(t(p_ij_30_9),o_iij,t_Mat); end_time <- Sys.time(); time_dt[i,timeSpent:=end_time-start_time] time_dt[i,expNum:=i] }

The important function here is OptimalDispatchOffline(). It's code is as follows: OptimalDispatchOffline<-function(pij,oiij,timeMat){ julia_assign("pij",pij) julia_assign("oiij",oiij) julia_command("sizeCall = size(pij,2);") julia_command("sizeEMS = size(pij,1);") julia_command("newModel=Model(solver=GurobiSolver());") julia_command("@variable(newModel,m[1:sizeEMS,1:sizeCall],Bin);") julia_command("@constraint(newModel,[j=1:sizeCall],sum(m[i,j] for i in 1:sizeEMS)==1);") julia_command("@constraint(newModel,[j=1:sizeEMS,i=1:sizeCall,iP=(i+1):sizeCall,i!=iP],oiij[i,iP,j]*(m[j,i]+m[j,iP])<=1);") julia_command("@objective(newModel,Min,sum(pij[i,j]*m[i,j] for i in 1:sizeEMS, j in 1:sizeCall));") julia_command("solve(newModel);") julia_command("outputCM=getvalue(m);") julia_command("getobjectivevalue(newModel);")

outputOpt<-julia_eval("outputCM"); penaltyOrNo<-apply(t(p_ij_30_9)*outputOpt,2,sum) ambDispAr<-apply(outputOpt,2,which.max) timeItTook<-apply(t(timeMat) * outputOpt,2,sum) output_dt<-data.table(penaltyOrNo,timeItTook,ambDispAr) return(output_dt) }

Lastly, 3 small functions that are needed for the data preparation in the for loop are:

p_Late<-function(dt_sample,ems_DC,Avg_Speed=30,timeT=9){ tempMat<-distm(dt_sample[,.(long,lat)],ems_DC[,.(long,lat)])/1000/Avg_Speed*60 out<-1*(tempMat>9) return(out) }`

p_Time<-function(dt_sample,ems_DC,Avg_Speed=30,timeT=9){ out<-distm(dt_sample[,.(long,lat)],ems_DC[,.(long,lat)])/1000/Avg_Speed*60 return(out) }

o_iij_calc<-function(dt_sample,ems_DC,timeMat,timePost=20){ sSize<-dim(timeMat)[1] dSize<-dim(timeMat)[2] o_iij<-array(0,dim=c(sSize,sSize,dSize)) for(i in 1:dSize) { for(j in 1:sSize){ timeStampLower<-dt_sample[j,time] timeStampUpper<-timeMat[j,i]+dt_sample[j,time]+timePost tempAr<-1*dt_sample[,time>=timeStampLower&time<=timeStampUpper] o_iij[j,j:sSize,i]<-tempAr[j:sSize] } o_iij[,,i]<-lower_tri.assign(o_iij[,,i],lower_tri(t(o_iij[,,i]))) } return(o_iij) }

mberat commented 4 years ago

Update: I have run a similar optimization in Julia and there is now slowdown there. So I think the problem is really somewhere in JuliaCall.

Non-Contradiction commented 4 years ago

Thanks for the code! It's really helpful. The code looks good to me in large, and there seems to be no creation of JuliaObject by JuliaCall, so there should be no way for JuliaObject to cause memory leak. The only thing I'm concerning about is that things like julia_assign and julia_command all operates in julia global scope, and there might be some performance penalty with that. What makes things more complicated is the multiple (re)creation of the JuMP model and (re)adding variables/constraints to a model with the same name in global scope, and I'm not sure how JuMP deals with such scenarios.

Does the similar optimization in Julia also involves everything in global scope?

I personally also use JuMP together with JuliaCall sometimes. The numofExperiments I have is five or six hundreds at largest and the sizes of matrices involved in the calculation are a little smaller than yours, and I have not noticed obvious slowing down. I tend to not have much calculation in global scope. For example, I will have some Julia function in a separate Julia file like this:

using JuMP
using Gurobi
function OptimalDispatchOffline(pij, oiij, timeMat)
    sizeCall = size(pij,2);
    sizeEMS = size(pij,1);
    newModel=Model(solver=GurobiSolver());
    @variable(newModel,m[1:sizeEMS,1:sizeCall],Bin);
    @constraint(newModel,[j=1:sizeCall],sum(m[i,j] for i in 1:sizeEMS)==1);
    @constraint(newModel,[j=1:sizeEMS,i=1:sizeCall,iP=(i+1):sizeCall,i!=iP],oiij[i,iP,j]*(m[j,i]+m[j,iP])<=1);
    @objective(newModel,Min,sum(pij[i,j]*m[i,j] for i in 1:sizeEMS, j in 1:sizeCall));
    solve(newModel);
    outputCM=getvalue(m);
    getobjectivevalue(newModel);
    outputCM
end

And I will julia_source(the julia file) and do things like outputOPT <- julia_call("OptimalDispatchOffline", pij, oiij, timeMat) back in R. In this workflow, the julia code can have a more smooth developing process like highlight/autocomplete/debugging with some preferred IDE, and things in global scope are minimal.

I will try to run the example as it is, maybe tomorrow afternoon, and see where the problem is. Hope the previous tips can work for you!

mberat commented 4 years ago

Absolutely, thanks for taking the time to look at it! I just finished implementing the way you described and running a test experiment. It worked wonders! I can say with your implementation the problem is resolved. so the problem is related to global definition of the model and its re-usage. Let me use this opportunity to thank you again for the "JuliaCall" package. As an eager R user and an Operations Management PhD student who needs to solve optimization models, I'm loving it! Thanks for your work and for your time helping with my problem!