Closed lkuper closed 8 years ago
Hmm. I can change userimg.jl
to something like
Base.reinit_stdio()
include("/home/lkuper/.julia/v0.4/ParallelAccelerator/src/ParallelAccelerator.jl")
using ParallelAccelerator
@acc tmp_f(A,B)=begin runStencil((a, b) -> a[0,0] = b[0,0], A, B, 1, :oob_skip); A.*B.+2 end
tmp_f([1,2],[3,4])
and this runs fine, but it doesn't actually appear to speed up compilation. What exactly needs to be in userimg.jl
for it to work?
I spent a while trying to sort this issue out today, using the following code:
include("/home/lkuper/.julia/v0.4/ParallelAccelerator/src/ParallelAccelerator.jl")
ParallelAccelerator.set_debug_level(3)
tmp_f(A,B) = begin runStencil((a::Array{Float64,1}, b::Array{Float64,1}) -> a[0,0] = b[0,0], A, B, 1, :oob_skip); A.*B.+2 end
ParallelAccelerator.accelerate(tmp_f,(Array{Float64,1},Array{Float64,1},))
The error message is:
ERROR: LoadError: AssertionError: CGen: variable GenSym(2) cannot have Any (unresolved) type
in from_lambda at /home/lkuper/.julia/v0.4/ParallelAccelerator/src/cgen.jl:446
in from_expr at /home/lkuper/.julia/v0.4/ParallelAccelerator/src/cgen.jl:1974
in from_root_entry at /home/lkuper/.julia/v0.4/ParallelAccelerator/src/cgen.jl:2410
in toCGen at /home/lkuper/.julia/v0.4/ParallelAccelerator/src/driver.jl:210
in accelerate at /home/lkuper/.julia/v0.4/ParallelAccelerator/src/driver.jl:409
in accelerate at /home/lkuper/.julia/v0.4/ParallelAccelerator/src/driver.jl:363
in include at ./boot.jl:261
in include_from_node1 at ./loading.jl:304
while loading /home/lkuper/.julia/v0.4/ParallelAccelerator/example.jl, in expression starting on line 24
This gets as far as toFlatParfors
, the "flattened code" stage. At that stage, GenSym(2)
is:
GenSym(2) = $(Expr(:lambda, Any[:(a::Any),:(b::Any)], Any[Any[Any[:a,Any,0],Any[:b,Any,0]],Any[],1,Any[]], :(begin
(Main.getindex)(b,0,0)
(Main.setindex!)(a,GenSym(0),0,0)
return GenSym(0)
end)))
I guess the Any[:(a::Any),:(b::Any)]
is indicative of the problem.
This is pretty clearly the generated code for the (a, b) -> a[0,0] = b[0,0]
function that's the first argument to runStencil
. Adding type annotations on a
and b
in the source means that some type assertions also appear in the generated code:
GenSym(2) = $(Expr(:lambda, Any[:(a::(top(apply_type))(Array,Float64,1)),:(b::(top(apply_type))(Array,Float64,1))], Any[Any[Any[:a,Any,18],Any[:b,Any,18]],Any[],1,Any[]], :(begin
(top(typeassert))(a,(top(apply_type))(Main.Array,Main.Float64,1))
(top(typeassert))(b,(top(apply_type))(Main.Array,Main.Float64,1))
(Main.getindex)(b,0,0)
(Main.setindex!)(a,GenSym(0),0,0)
return GenSym(0)
end)))
But, doing so results in the same Any
error as the other version does.
The code works fine with @acc
instead of accelerate
. (In the @acc
version, there's nothing that's the equivalent of GenSym(2)
. The @acc
version won't go through ParallelAccelerator unless we actually call the function with arguments, and in that case, the generated code is quite different -- the only occurrence of lambda
in the generated code is the top-level $(Expr(:lambda, ...))
.)
By the way, using the function tmp_f(A,B) ...
syntax instead of the tmp_f(A, B) = ...
shorthand doesn't change anything substantial. Using the do
-block syntax for runStencil
instead of the ->
lambda syntax also doesn't.
There are really two problems here: making embed
work again (whether it uses accelerate
or not, I don't care), and figuring out what's going wrong with accelerate
. I've been focusing on the latter, but I'm about ready to just give up and see if we can make embed
work just using @acc
.
Trying again using @acc
: if the contents of userimg.jl are
Base.reinit_stdio()
include("/home/lkuper/.julia/v0.4/ParallelAccelerator/src/ParallelAccelerator.jl")
using ParallelAccelerator
@acc tmp_f(A,B) = begin runStencil((a, b) -> a[0,0] = b[0,0], A, B, 1, :oob_skip); A.*B.+2 end
tmp_f([1.0, 2.0, 3.0, 4.0], [1.0, 2.0, 3.0, 4.0])
then embed()
seems to run fine, but if I quit and run my recompiled Julia again, I get results like this for black-scholes, for example:
iterations = 10000000
SELFPRIMED 16.835934927
checksum: 2.0954821257116848e8
rate = 1.6082537124446222e8 opts/sec
SELFTIMED 0.062179244
which show that it's not working (SELFPRIMED
should be under a second).
Also, if I try to include, say, black-scholes.jl in the same REPL session right after the embed
call, then I get various warnings and errors to do with precompilation.
Further note: the userimg.jl is certainly running, because tmp_f
is defined in a fresh REPL session, and furthermore, it runs with no compilation pause. However, @acc
isn't defined until we run using ParallelAccelerator
.
@ehsantn @ninegua Any ideas what to try here?
@JeffBezanson Following up on our discussion yesterday: If I just put using ParallelAccelerator
in userimg.jl, and the lines
using ParallelAccelerator
@acc tmp_f(A,B) = begin runStencil((a, b) -> a[0,0] = b[0,0], A, B, 1, :oob_skip); A.*B.+2 end
tmp_f([1.0, 2.0, 3.0, 4.0], [1.0, 2.0, 3.0, 4.0])
in the ParallelAccelerator.jl file itself right after the ParallelAccelerator
module definition, then Julia compile time seems to take about 4 seconds longer when the userimg.jl file is present than when it is not. Over a couple of runs:
Without: real 2m11.072s user 2m9.978s sys 0m2.210s
With: real 2m15.581s user 2m14.409s sys 0m2.346s
Without: real 2m11.336s user 2m10.254s sys 0m2.238s
With: real 2m16.904s user 2m15.831s sys 0m2.170s
However, this doesn't actually seem to help:
julia> using ParallelAccelerator
julia> @acc tmp_f(A,B) = begin runStencil((a, b) -> a[0,0] = b[0,0], A, B, 1, :oob_skip); A.*B.+2 end
tmp_f (generic function with 1 method)
julia> @time tmp_f([1.0, 2.0, 3.0, 4.0], [1.0, 2.0, 3.0, 4.0])
17.113510 seconds (25.17 M allocations: 1.146 GB, 1.22% gc time")"
4-element Array{Float64,1}:
3.0
6.0
11.0
18.0
julia> @time tmp_f([1.0, 2.0, 3.0, 4.0], [1.0, 2.0, 3.0, 4.0])
0.001106 seconds (60 allocations: 2.422 KB")"
4-element Array{Float64,1}:
3.0
6.0
11.0
18.0
and I wouldn't expect it to, since 4 seconds is probably not enough time to compile ParallelAccelerator (we'd expect it to take more like 20 seconds).
If I change the contents of userimg.jl to
include("/home/lkuper/.julia/v0.4/ParallelAccelerator/src/ParallelAccelerator.jl")
then that does seem to make a bigger difference to compile time:
real 2m19.778s user 2m18.697s sys 0m2.218s
and so I was hopeful, but nope, calling tmp_f
is still slow on the first run.
What is interesting is that if I build a Julia that doesn't have the package pre-included, then running using ParallelAccelerator
at the REPL is slow, as we'd expect because of the tmp_f
stuff now in the actual file. If I have a Julia that does pre-include the package, then using ParallelAccelerator
is instantaneous but actually calling an accelerated function is slow the first time. If those are my two options, then I guess I want the former, because I'd rather have running using ParallelAccelerator
take 20 seconds than have the first call to an accelerated function be slow and have users suspect that ParallelAccelerator is making their code slower. So my plan for now is to just leave things as they are and stop encouraging people to use the embed
functionality.
The order in which things are being computed is relevant here. Is it the fact that @acc
is a macro that causes this to not work?
https://github.com/IntelLabs/ParallelAccelerator.jl/commit/3c1f8a862f90099f56241e50925c01bfaf1ce56d moves the code that defines and runs tmp_f
into the ParallelAccelerator.jl package file itself. This means that the delay is actually at package load time (at the time that using ParallelAccelerator
is run). Using embed
(which now just inserts using ParallelAccelerator
into userimg.jl) will make using ParallelAccelerator
instantaneous but just puts off the long delay until the first time the function is called. That, to me, is actually inferior to just having the delay be at package load time. So I updated the docs to reflect that we don't recommend that most users use embed
. I think we can close this issue now.
The previous approach caused problems in distributed mode, so #69 tries going back to the original accelerate-based approach with some tweaks. Going to close this for now, since #69 seems to fix it, modulo the issue discussed in that PR -- which hopefully won't be a problem for most people.
The code inserted into
userimg.jl
byembed()
seems to have stopped working recently:It still uses
accelerate
-- is there any reason why it needs to? Could we change it to something that uses@acc
? I no longer trustaccelerate
...