Closed phillc73 closed 3 years ago
Thank you very much for the feedback!
The error message is about parsing, so I suspect the issue is the formatting of the julia command. Actually, it works for me if we define the query in a separate string and then pass the string to the function like this:
query <- "
x = @from i in mtcars begin
@where i.disp > 200
@select {i.mpg, i.cyl}
@collect DataFrame
end
"
library(microbenchmark)
library(dplyr)
library(data.table)
# Make a data.table
mtcars_data_table <- data.table(mtcars)
test_bench <- microbenchmark(times=500,
# Queryjl
Queryjl = {julia_command(query)},
# data.table library
data_table = {mtcars_data_table[disp >= 200, c("mpg", "cyl"),]},
# dplyr library
dplyr = {mtcars %>%
dplyr::filter(disp >= 200) %>%
dplyr::select(mpg,cyl)}
)
And I suggest this approach whenever we nest some multi-line julia command string in R code.
And I see that you are trying to benchmark the code.
Actually, julia_command
is quite inefficient and designed for interactive use. Whatever command it evaluate, it is like eval(parse(text = "...."))
in R. Another thing is that the julia_command
executes in the global scope, and this is the first thing that we need to pay attention in the julia performance tips: https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-tips
Thanks! I've managed to find a way to make it work.
This works:
query_a <- "
x = @from i in mtcars begin
@where i.disp > 200
@select {i.mpg, i.cyl}
@collect DataFrame
end
"
julia_command(query_a)
However, this does not work:
query_b <- "
x = @from i in mtcars begin
@where i.disp > 200
@select {i.mpg, i.cyl}
@collect DataFrame
end
"
julia_command(query_b)
It seems like white space is somehow very important here when parsing strings. If one looks at the two query
strings the difference is clear, but I'm still not really clear on why parsing one works, but not the other.
r$> query_a
[1] "\nx = @from i in mtcars begin\n @where i.disp > 200\n @select {i.mpg, i.cyl}\n @collect DataFrame\nend\n"
r$> query_b
[1] "\n x = @from i in mtcars begin\n @where i.disp > 200\n @select {i.mpg, i.cyl}\n @collect DataFrame\n end\n "
If the problem is known, and specifically perhaps to do with the white space at the end of query_b
, which is a guess due to the extra token at end of message
error, perhaps this could be dealt with by JuliaCall
when parsing strings to Julia?
The issue is the trailing white space.
This works:
library(stringi)
query <- stri_trim_both("
x = @from i in mtcars begin
@where i.disp > 200
@select {i.mpg, i.cyl}
@collect DataFrame
end
")
julia_eval(query)
I have a way forward to not have to worry too much about this now. However, it still might be useful for others if JuliaCall
could handle this string parsing issue.
On a related note, is there a faster way to do this than using julia_command
? I tried julia_eval
but results were pretty much identical.
Thank you very much for exploring on this!
I will trim the string before julia_command
send the string for Julia to evaluate.
julia_eval
is almost identical to julia_command
. The only difference is whether the evaluation result is transferred from julia to R, and whether the julia display mechanism is invoked.
From the above link to the performance tips, you can see that the julia performance tip really emphasize on writing functions, which is also the case for JuliaCall.
julia_command("
function queryjl(mtcars)
x = @from i in mtcars begin
@where i.disp > 200
@select {i.mpg, i.cyl}
@collect DataFrame
x
end
end")
And when you evaluate the above, you should see queryjl (generic function with 1 method)
, which says the function is defined on the Julia side successfully. And the function is ready to use by things such as
julia_command("queryjl(mtcars)")
.
However, this only deals with the second point I mentioned, that the code execution in the global environment is slow.
julia_command
still need to invoke eval
and parse
in Julia.
julia_call
interface, which calls julia function directly instead of eval and parse some string command. In this case, you can just use julia_call("queryjl", mtcars)
. Note that in this way we do not need to julia_assign("mtcars", mtcars)
first, because JuliaCall will look for the R object mtcars
, and then try to convert it to some Julia object and then call the julia function.mtcars_data_table <- data.table(mtcars)
, we could further consider using JuliaObject
to do the R->julia conversion before hand:
# Make a JuliaObject for JuliaCall to use in julia_call function
## without type conversion over and over again
mtcars_julia_object <- JuliaObject(mtcars)
The JuliaObject
do the R->Julia conversion and the result is a wrapper on the R side which points to the object on the julia side. And when JuliaCall see the JuliaObject, it knows that the conversion is already done and just grab the actual julia object pointed to. We can now use it like this: julia_call("queryjl", mtcars_julia_object)
.
In summary, we can benchmark the different methods roughly like this:
test_bench <- microbenchmark(times=500,
# Queryjl
Queryjlcommand = {julia_command("queryjl(mtcars);")},
Queryjlcall_withoutconversionbefore = {julia_call("queryjl", mtcars)},
Queryjlcall = {julia_call("queryjl", mtcars_julia_object)},
# data.table library
data_table = {mtcars_data_table[disp >= 200, c("mpg", "cyl"),]},
# dplyr library
dplyr = {mtcars %>%
dplyr::filter(disp >= 200) %>%
dplyr::select(mpg,cyl)}
)
There is also some note for the different methods. julia_command
will not convert the result on the julia side back into R. But julia_call
and julia_eval
will (by default). And both julia_eval
and julia_call
have an argument called need_return
, which controls whether and how the result is returned into R.
Hope the information is helpful. I think I will further write this information out like a vignette or something.
Thanks! That's a tonne of good information. I really appreciate it.
Everything looks good, apart from one thing.
This sequence does not work:
julia_command("
function queryjl(mtcars)
x = @from i in mtcars begin
@where i.disp > 200
@select {i.mpg, i.cyl}
@collect DataFrame
x
end
end")
julia_command("queryjl(mtcars);")
Results in:
UndefVarError: mtcars not defined
Stacktrace:
[1] top-level scope at none:1
[2] eval(::Module, ::Any) at ./boot.jl:331
[3] eval_string(::String) at /home/phillc/R/x86_64-pc-linux-gnu-library/4.0/JuliaCall/julia/setup.jl:203
[4] docall(::Ptr{Nothing}) at /home/phillc/R/x86_64-pc-linux-gnu-library/4.0/JuliaCall/julia/setup.jl:176
However, both of these do return the correct result:
julia_call("queryjl", mtcars)
mtcars_julia_object <- JuliaObject(mtcars)
julia_call("queryjl", mtcars_julia_object)
In julia_command("queryjl(mtcars);")
, the mtcars
refer to variable on the julia side. So we need to do julia_assign("mtcars", mtcars)
as your original code.
In both julia_call("queryjl", mtcars_julia_object)
and julia_call("queryjl", mtcars)
, the mtcars_julia_object
and mtcars
refer to variable on the R side. So we don't need to do any julia_assign
thing.
I was trying to benchmark some JuliaCall code against native R code. The JuliaCall standalone code works, but when trying to include this in a
microbenchmark
an error is returned.This works:
This fails:
Error is:
R sessionInfo()