IntelLabs / ParallelAccelerator.jl

The ParallelAccelerator package, part of the High Performance Scripting project at Intel Labs
BSD 2-Clause "Simplified" License
294 stars 32 forks source link

kmeans fusion bug #100

Closed ehsantn closed 8 years ago

ehsantn commented 8 years ago

For kmeans example, the compiler used to fuse first and second loops (dist and labels calculations) in the last tag release. Now they don't fuse. Any idea why?

ehsantn commented 8 years ago

Range expressions like N-1+1 are not simplified. Maybe this is the cause.

ehsantn commented 8 years ago

Another issue: array dist is accessed by getindex in this parfor instead of unsafe_arrayref. Any idea why? @ninegua @DrTodd13

HPAT always assumes arrays of a parfor are accessed by unsafe_arrayref and getindex is for serial code.

$(Expr(:parfor, PIR Body: GenSym(56) = (ParallelAccelerator.API.getindex)(dist::Array{Array{Float64,1},1},parfor_index_1_50::Int64)::Array{Float64,1} GenSym(57) = (ParallelAccelerator.API.indmin)(GenSym(56))::Int64 parallel_ir_array_temp_GenSym(97)_53_2 = GenSym(57) (Base.unsafe_arrayset)(GenSym(5),parallel_ir_array_temp_GenSym(97)_53_2::Int64,parfor_index_1_50::Int64)::Array{Int64,1} Loop Nests: ParallelAccelerator.ParallelIR.PIRLoopNest(:(parfor_index_1_50::Int64),1,:(parallel_ir_save_array_len_1_50::Int64),1) Poststatements: 0 ))

ehsantn commented 8 years ago

copy propagation and arraysize replacement are also broken :(

ninegua commented 8 years ago

The Expr simplifier has been fixed to handle :box correctly, which should fix the fusion issue.

The residual getindex comes from the definition of indmin. In this API.Lib module, getindex is not being overloaded, so I don't really know why it is not turned into arrayref by Julia's own compiler. But in any case, it is not supposed to be unsafe_arrayref because bounds checking is necessary to ensure correct error report.

Let me know if you have more questions.

ninegua commented 8 years ago

I take another look, actually the getindex comes from k-means.jl source in this expression: indmin(dist[i]). The @noinline trick we did for operator capture prevents it from turning into arrayref. But it has always been handled like this.

ehsantn commented 8 years ago

What do you mean "it has always been handled like this"? The tagged version of ParallelAccelerator generates unsafe_arrayref. Otherwise, HPAT would never work for this example.

I thought parfors require unsafe_arrayref; otherwise, parallelism isn't safe.

ehsantn commented 8 years ago

I just checked again; still not fused.

ehsantn commented 8 years ago

Expr simplification seems to work now though. The fusion problem is somewhere else.

ninegua commented 8 years ago

The problem seems to be that the incoming AST is different than before (e.g., x = ...;y = x; instead of just y = ...), which prevents maxFusion from pushing two domain nodes to adjacent position, which in turn prevents fusion. I've fixed it in the latest commit.

After fusion, the getindex is simply eliminated, which was also why you didn't have it before. But the use of getindex of arrayref is perfectly legal in parfor nodes. This is because the source code was written using explicit indexing, and by default we should still bounds check when we can not infer the indexing is safe. In this case, dist is indexed by i within a comprehension, and if we do not know i is within the bounds of dist, we cannot replace it with unsafe_arrayref.

After all, there is no pass that actively replace getindex or arrayref in the incoming AST with unsafe_arrayref. We only produce the no-bounds-checking unsafe_arrayref internally when we are sure things are in bound.

Due to the way fusion works, the final fused result replaces this getindex with an intermediate value from previous parfor loop. It should be seen as a happy accident :P

ehsantn commented 8 years ago

Thanks. The fusion problem is now resolved. However, the HPAT version crashes because there is an uninitialized variable GenSym(70) used in the AST. This is probably a copy propagation issue. I'm taking a look now.

ehsantn commented 8 years ago

Problem doesn't seem to be copy propagation. It has something to do with inner lambdas and replacement of variables in them probably.

ehsantn commented 8 years ago

The problem is that GenSym(22) is imported to inner lambda as a symbol correlation, but it is not replaced properly. I commented out setEscCorrelations until the root problem is fixed.

ninegua commented 8 years ago

Julia 0.5 adds more trouble. Julia 0.4 translates comprehension using range such as 1:N to operations that computes the length on the range object. But Julia 0.5 desugars it further, and eventually produces select_value(1 <= N, N, 0). So we can no longer infer the length is equivalent to N (which only holds true for non-negative N). Probably needs some ad-hoc solution just to get around this problem.

ninegua commented 8 years ago

The fusion issue with 0.5 is now fixed, see https://travis-ci.org/IntelLabs/ParallelAccelerator.jl/jobs/146482407

ehsantn commented 8 years ago

Sounds good. HPAT generated code needs to makes sure isNonNeg works for the generated variables but this is probably very easy.