Closed ehsantn closed 8 years ago
Range expressions like N-1+1
are not simplified. Maybe this is the cause.
Another issue: array dist
is accessed by getindex
in this parfor instead of unsafe_arrayref
. Any idea why? @ninegua @DrTodd13
HPAT always assumes arrays of a parfor are accessed by unsafe_arrayref
and getindex
is for serial code.
$(Expr(:parfor, PIR Body: GenSym(56) = (ParallelAccelerator.API.getindex)(dist::Array{Array{Float64,1},1},parfor_index_1_50::Int64)::Array{Float64,1} GenSym(57) = (ParallelAccelerator.API.indmin)(GenSym(56))::Int64 parallel_ir_array_temp_GenSym(97)_53_2 = GenSym(57) (Base.unsafe_arrayset)(GenSym(5),parallel_ir_array_temp_GenSym(97)_53_2::Int64,parfor_index_1_50::Int64)::Array{Int64,1} Loop Nests: ParallelAccelerator.ParallelIR.PIRLoopNest(:(parfor_index_1_50::Int64),1,:(parallel_ir_save_array_len_1_50::Int64),1) Poststatements: 0 ))
copy propagation and arraysize replacement are also broken :(
The Expr simplifier has been fixed to handle :box
correctly, which should fix the fusion issue.
The residual getindex
comes from the definition of indmin
. In this API.Lib
module, getindex
is not being overloaded, so I don't really know why it is not turned into arrayref
by Julia's own compiler. But in any case, it is not supposed to be unsafe_arrayref
because bounds checking is necessary to ensure correct error report.
Let me know if you have more questions.
I take another look, actually the getindex
comes from k-means.jl
source in this expression: indmin(dist[i])
. The @noinline
trick we did for operator capture prevents it from turning into arrayref
. But it has always been handled like this.
What do you mean "it has always been handled like this"? The tagged version of ParallelAccelerator generates unsafe_arrayref
. Otherwise, HPAT would never work for this example.
I thought parfors require unsafe_arrayref
; otherwise, parallelism isn't safe.
I just checked again; still not fused.
Expr simplification seems to work now though. The fusion problem is somewhere else.
The problem seems to be that the incoming AST is different than before (e.g., x = ...;y = x;
instead of just y = ...
), which prevents maxFusion
from pushing two domain nodes to adjacent position, which in turn prevents fusion. I've fixed it in the latest commit.
After fusion, the getindex
is simply eliminated, which was also why you didn't have it before. But the use of getindex
of arrayref
is perfectly legal in parfor nodes. This is because the source code was written using explicit indexing, and by default we should still bounds check when we can not infer the indexing is safe. In this case, dist
is indexed by i
within a comprehension, and if we do not know i
is within the bounds of dist
, we cannot replace it with unsafe_arrayref
.
After all, there is no pass that actively replace getindex
or arrayref
in the incoming AST with unsafe_arrayref
. We only produce the no-bounds-checking unsafe_arrayref
internally when we are sure things are in bound.
Due to the way fusion works, the final fused result replaces this getindex
with an intermediate value from previous parfor loop. It should be seen as a happy accident :P
Thanks. The fusion problem is now resolved. However, the HPAT version crashes because there is an uninitialized variable GenSym(70)
used in the AST. This is probably a copy propagation issue. I'm taking a look now.
Problem doesn't seem to be copy propagation. It has something to do with inner lambdas and replacement of variables in them probably.
The problem is that GenSym(22)
is imported to inner lambda as a symbol correlation, but it is not replaced properly. I commented out setEscCorrelations
until the root problem is fixed.
Julia 0.5 adds more trouble. Julia 0.4 translates comprehension using range such as 1:N
to operations that computes the length
on the range object. But Julia 0.5 desugars it further, and eventually produces select_value(1 <= N, N, 0)
. So we can no longer infer the length is equivalent to N
(which only holds true for non-negative N
). Probably needs some ad-hoc solution just to get around this problem.
The fusion issue with 0.5 is now fixed, see https://travis-ci.org/IntelLabs/ParallelAccelerator.jl/jobs/146482407
Sounds good. HPAT generated code needs to makes sure isNonNeg
works for the generated variables but this is probably very easy.
For
kmeans
example, the compiler used to fuse first and second loops (dist
andlabels
calculations) in the last tag release. Now they don't fuse. Any idea why?