Can I use JuliaCall in R parallel scripts?

kongdd commented 5 years ago

I am curious about whether can I use JuliaCall in R parallel scripts? Should I initialize Julia by julia_setup in each R parallel process?

Non-Contradiction commented 5 years ago

This is a good question. I'm not sure about it. It may depend on what kind of parallelism you want. However, I guess, in general, the answer is no. What do you mean by "parallel scripts"?

dslate1 commented 4 years ago

I am trying to do something similar but have yet to succeed. I have been running R version 3.6.2 on multi-core x86_64 workstations under Ubuntu 16.04.6 LTS. I frequently do parallel processing in R using mclapply, which spins off separate processes and collects their results. Recently I have been working on an R program which in each run performs one of several related tasks. One of these tasks requires Julia (currently version 1.3.1 installed from binaries downloaded from julialang.org). I recently discovered JuliaCall_0.17.1 and decided to try to use it to get to Julia from my R program. My initial attempts failed with error messages like:

library( JuliaCall) julia_setup() Julia version 1.3.1 at location /usr/local/julia-1.3.1/bin will be used. Error in dyn.load(.julia$dll_file) : unable to load shared object '/usr/local/julia-1.3.1/bin/../lib/libjulia.so.1': /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.22' not found (required by /usr/local/julia-1.3.1/bin/../lib/julia/libLLVM-6.0.so)

This libstdc++.so.6 incompatibility issue has been reported in various postings on the Web.

After some fiddling around I finally got julia_setup() working by setting:

export R_LD_LIBRARY_PATH=/usr/local/lib/R/lib:/usr/local/julia-1.3.1/lib/julia

After that I was able to use julia_assign, julia_command, julia_eval, julia_setup, and julia_source successfully. One complication of getting all this to work is that I am running on offline machines, so I can't run anything that tries to access the Internet directly, but I can transfer data between my online and offline machines using removable media.

Now my R to Julia interface worked fine as long as I avoided parallel processing. My mclapply calls worked ok for those tasks that didn't require Julia, but when I tried to use Julia within the spinned-off processes I started to get lots of error messages indicating catastrophic failures. Here is a small sample:

signal (11): Segmentation fault in expression starting at none:0 unknown function (ip: 0x7ff53b89cc98)

Invalid instruction at 0x7ff53b895f80: 0x60, 0x49, 0x8d, 0x86, 0x40, 0xe0, 0xff, 0xff, 0x48, 0xbf, 0x40, 0xd0, 0x55, 0x42, 0xf5

signal (4): Illegal instruction in expression starting at none:0 unknown function (ip: 0x7ff53b895f7f)

I tried doing julia_setup again (with force = T) in each parallel R process but that didn't help. I also tried to rebuild Julia from source, but that ran into lots of problems that I have yet to resolve.

Except for this problem with mclapply, JuliaCall seems be a good package. I would really like to get R, Julia, JuliaCall, and mclapply to all play nice together, but so far no luck. Perhaps the problem is due to the libstdc++.so.6 version difference, or maybe it's related to interference between R's and Julia's parallel processing facilities, although I am not (consciously anyway) trying to use the latter. I'll repost here if I do succeed in getting it all working. Meanwhile, good luck with your own efforts.

Non-Contradiction commented 4 years ago

@dslate1 Thank you for the feedback! After some investigation, the problem seems to be the data transferring process between R and Julia. And some tricks seem to mitigate the problem a little. For example:

library(parallel)
library(JuliaCall)
mclapply(1:10, function(i)i)
julia_setup()
mclapply(1:10, 
         function(i){
           julia_setup()
           JuliaCall:::.julia$cmd("using RCall")
           r <- julia_call("sqrt", i)
           r
         })

works for me partly on my university's cluster, which uses Red Hat I think, with the problem that some parallel cores have some error and do not return the expected results randomly.

Note that julia_setup() before and within the parallel part and the JuliaCall:::.julia$cmd("using RCall") within the parallel part are critical. I need to change a little on how julia_setup() works so this JuliaCall:::.julia$cmd thing here would not be necessary for the future. The JuliaCall:::.julia$cmd function is the most direct way for JuliaCall to give commands to the embedded Julia, and there seems to be no problem with that. The only issue is that it cannot deal with data transferring between R and Julia, and that is why JuliaCall:::.julia$cmd("using RCall") is needed. The random error with this method still seems to be with the data transferring process between R and Julia, maybe some memory leak?

In sum, the safe way seems to be using JuliaCall:::.julia$cmd to execute some string command for the parallel part, but this method can not deal with data transferring between R and Julia. So maybe save the result in some format in the Julia command and then retrieve the saved result later in R. The first example illustrates the unsafe way, and I need to do some further investigation on it.

dslate1 commented 4 years ago

Thanks Non-Contradiction for responding to my inquiry. I may try some of your suggestions, but it sounds like there is still work to be done in JuliaCall to fix whatever is going wrong. When you do think you have a fix, I'll be happy to test it with my application.

dslate1 commented 4 years ago

One more observation about the errors I am getting trying to use JuliaCall with mclapply: in my experience, mclapply spins off child processes pretty much like the Unix/Linux fork() operation. Although they inherit the memory of the parent process, they are mostly independent of it (and the other children), and any changes they make to the variables inherited from the parent are private to themselves. So for the most part, what the child processes do, including calling julia, shouldn't really interfere with each other or the parent. The exceptions I've seen are primarily those packages that rely on communication with an external server for their functionality, in which case child processes may conflict when they access the same server that the parent process used and claim the same client identity. There can also be problems if the child processes try to access files with the same names.

However, as far as I know JuliaCall links R and julia together in a single process -- there is no client/server communication involved. Also, my julia code doesn't write to files whose names might conflict with files used by sibling processes spun off by mclapply. What I'm getting at is that I don't understand what mechanism could be causing the child processes to crash they way they do. Note that I use julia_source to read some julia code into my program, julia_assign to transfer data from R to julia, julia_command (with 'show-value = F') to execute code within julia, and julia_eval to retrieve data from julia back to R.

GantZA commented 4 years ago

I believe I'm also running into this issue. I've tried to use the diffeqr package to solve multiple SDE's in parallel using foreach and %dopar%. The script works when run on MacOS but fails on Linux. I'm busy testing on ubuntu and the error I obtain is almost exactly the same as the error when I try to run the above code example. I've also tried using julia_call("sqrt", list(i), need_return="None") but I get the same signal (11): Segmentation fault error.

I've tried 3 different versions of Julia as well, 1.0.5, 1.1.1 and 1.4.2 with similar Segmentation errors but not the same signal

jfunction commented 4 years ago

@Non-Contradiction Is there something I can do to look into this issue?

So... I may have convinced my research group to spend a lot of money on a beefy computer so we could process things quickly (rather than using a Mac laptop) only to come across this issue after we installed Ubuntu on the new machine (CentOS 7 won't work either). I can confirm as GantZA said above that there are no such problems on a Mac which is strange to me since both are a kind of Unix so perhaps this is a library issue?

I would rather fix the JuliaCall library than hack around the issue by saving outputs to disk and have had some experience digging into nix internals. I would be happy to look into things if you could give a hint where to look? How does the data transfer process work under the hood?

jfunction commented 4 years ago

Update: Here's the code I use to reproduce the issue

library(JuliaCall)
library(doParallel)

numCores <- 4
numRuns <- 8

#### Setup JuliaCall ####
if (is.na(Sys.getenv("IsJuliaSetupComplete", NA))) {
  if (.Platform$OS.type == "windows") {  # Untested
    JULIA_HOME <- system("WHERE julia", intern=TRUE)
    JULIA_HOME <- stringr::str_replace(JULIA_HOME, "julia.exe", "")
  } else {
    # Assumes julia is in your PATH, otherwise add this to ~/.bashrc
    # PATH=$PATH:/path/to/julia/bin
    JULIA_HOME <- system("ls -g `which julia` | rev | cut -d' ' -f1 | rev | xargs dirname", intern=TRUE)
  }
  JuliaCall::julia_setup(JULIA_HOME=JULIA_HOME)
  diffeqr::diffeq_setup()
  Sys.setenv(IsJuliaSetupComplete=TRUE)
}

#### Run some Julia code in parallel ####
registerDoParallel(numCores)
foreach(r=1:numRuns) %dopar% {
  JuliaCall::julia_setup(JULIA_HOME=JULIA_HOME)  # Add this, don't add this, it doesn't seem to matter.
  JuliaCall::julia_assign("arg", r)  # THIS BREAKS
}

to which I get an error starting with

rsession: /buildworker/worker/package_linux64/build/src/debuginfo.cpp:1612: void register_eh_frames(uint8_t*, size_t): Assertion `end_ip != 0' failed.

signal (6): Aborted

Specifically I noted the issue occurring when julia_assign is called inside the dopar loop and writing normal R code before that seems to execute just fine. So this is R -> Julia communication which isn't working in this case.

Our current recourse is to run our R script with a sequential for loop over a chunk of , say, N=10000 runs, but to run the R scripts in parallel in a worker fashion and accumulate the results manually. After I get that working I'll try somehow gathering up the equivalent Julia code which would be run and write that code to disk as a Julia file. Then I'll execute that in Julia (in parallel) and save it to disk. Then I will read the cached results in R and continue processing. Yikes! Wish me luck.

I'm wondering if there's a simple way to get this to work by better understanding how the dmg file works (and how it works differently from just using the Linux binary).

jfunction commented 4 years ago

I'm not proud of it, but I got our script to run in parallel using gnu parallel and converting the script to a worker script (SIMD style). If someone is desperate and would like to do this they can ping me. I tried to reverse the Julia dmg to figure out what magic was happening but this was difficult without actually owning a Mac.

Work is moving on my side and the hack is functional so I am going to leave this issue for now. Good luck going forward.

jpkrooney commented 3 years ago

I'm a total newbie at Julia, but came here because I am considering using JuliaCall for speed within parallel R code. Noting the above, and recalling that there was an instability issue with mclapply run under Rstudio (thats seems to be patched now: https://github.com/rstudio/rstudio/issues/2597 ), I wondered would the future.apply package work. So I tried this under R4.03, Julia 1.5 JuliaCall 0.17.2:

library(JuliaCall)
library(future.apply)

x <- 0:1000

y0 <- lapply(1:length(x), function(i) sqrt(x[i]) )

plan( multisession, workers = 10 )
y1 <- future_lapply(1:length(x), function(i) sqrt(x[i]), future.seed = 1 )

plan( multisession, workers = 10 )
y2 <- future_lapply(1:length(x), function(i) julia_call( "sqrt", x[i] ), future.seed = 1) # including a seed suppresses warning about unreliable random numbers

Of course this is a trivial example, but it seems to run ok:

> all.equal(y0, y1)
[1] TRUE
> all.equal(y0, y2)
[1] TRUE

Non-Contradiction commented 3 years ago

@jpkrooney Thank you very much for looking into this! I tested your code snippet, and it works for me both on my Windows 10 laptop (I cannot believe it is working on Windows!) and my university's Red Hat Enterprise Linux 7 cluster (with RStudio server or R REPL directly).

And the mechanism seems quite stable for me. The only possible issue is that in new version of julia, the precompilation procedure of julia packages happen in parallel, which can interfere with the R parallel part. So make sure that all the julia packages needed (including the julia dependencies of JuliaCall) are precompiled already before the launch of the parallel julia task.

I will further try to document these somewhere.

Thank @jpkrooney again for the information and the wonderful future.apply!

jpkrooney commented 3 years ago

Welcome! Yeah I think the future packages were designed with consistency across systems in mind - check out some other future packages here https://github.com/HenrikBengtsson!

MartinFXP commented 1 year ago

Hello and sorry for reviving this issue.

@jpkrooney your proposed solution works, but only with a sock cluster. I.e., multicore or cluster combined with parallel::makeForkCluster does not work.

library(future)
library(future.apply)
library(parallel)
library(JuliaCall)

x <- 0:1000

plan( multicore, workers = 2)
y2 <- future_lapply(1:length(x), function(i) julia_call( "sqrt", x[i] ), future.seed = 1) # does not work

cl <- makeForkCluster(2)
y2 <- future_lapply(1:length(x), function(i) julia_call( "sqrt", x[i] ), future.seed = 1) # does not work

cl <- makePSOCKcluster(2)
clusterEvalQ(cl,"library(JuliaCall);julia_call")
clusterExport(cl,"x")
y <- clusterApply(cl,1:length(x), function(i) JuliaCall::julia_call( "sqrt", x[i] )) # this works

I guess in JuliaCall the different parallel processes use the same memory pointers for each others data. I don't really know how that works, though.

It is a pity because I have large memory consumption and cannot work with a sock cluster.

JuliaInterop / JuliaCall

Can I use JuliaCall in R parallel scripts? #120