IntelLabs / ParallelAccelerator.jl

The ParallelAccelerator package, part of the High Performance Scripting project at Intel Labs
BSD 2-Clause "Simplified" License
294 stars 32 forks source link

Calculate Pi example #43

Closed ehsantn closed 8 years ago

ehsantn commented 8 years ago

The simple program below for calculating Pi fails. CGen chokes on allocations for some reason.

More importantly, the AST coming from ParallelIR is not optimized properly. It has array allocations but no array should be allocated here. Also, the loops are not fused.

using ParallelAccelerator

@acc function calcPi(n::Int64)
    x = rand(n) .* 2 .- 1
    y = rand(n) .* 2 .- 1
    return 4.0*sum(x.*x .+ y.*y .< 1)/n
end

calcPi(1000)
ehsantn commented 8 years ago

A lot of the issues for this example seem to be related to #44. After making the constants explicit float values, it works.

However, the extra allocations are still there.

ninegua commented 8 years ago

The allocations come with rand(n). The allocated array is an input to inplace mmap, and is no longer needed after fusion. @DrTodd13 do you think it is possible to get rid of this allocation?

ehsantn commented 8 years ago

I think we could add a pass of removing allocations after fusion and allocation hoisting is done in top_level_from_exprs(). Any thoughts?

ehsantn commented 8 years ago

Removing allocations is implemented now.