Logistic regression error

IntelLabs / ParallelAccelerator.jl

The ParallelAccelerator package, part of the High Performance Scripting project at Intel Labs

BSD 2-Clause "Simplified" License

294 stars 32 forks source link

Logistic regression error #41

Closed ehsantn closed 8 years ago

ehsantn commented 8 years ago

This program for logistic regression fails. CGen gives the error below but I suspect there are many more issues we need to fix for proper compilation and parallelization.

using ParallelAccelerator
iter = 15

@acc function main(iterations::Int64)
    D = 10  # Number of dimensions
    N = 100
    w::Array{Float64,1} = 2.0.*rand(D)-1.0
    labels = rand(N)
    points = rand(N,D)
    for i in 1:iterations
       w -= squeeze(((1.0./(1.0.+exp(-labels.*(points*w))).-1.0).*labels)'*points,1)
    end
    w
end

W = main(iter)
println(W)

ERROR: LoadError: AssertionError: CGen: Strings are not supported
 in from_lambda at /home/etotoni/.julia/v0.4/ParallelAccelerator/src/cgen.jl:451
 in from_expr at /home/etotoni/.julia/v0.4/ParallelAccelerator/src/cgen.jl:2121
 in from_root at /home/etotoni/.julia/v0.4/ParallelAccelerator/src/cgen.jl:2502
 in from_worklist at /home/etotoni/.julia/v0.4/ParallelAccelerator/src/cgen.jl:2686
 in from_root at /home/etotoni/.julia/v0.4/ParallelAccelerator/src/cgen.jl:2599
 in from_worklist at /home/etotoni/.julia/v0.4/ParallelAccelerator/src/cgen.jl:2686
 in from_root at /home/etotoni/.julia/v0.4/ParallelAccelerator/src/cgen.jl:2599
 in from_root at /home/etotoni/.julia/v0.4/ParallelAccelerator/src/cgen.jl:2469
 in toCGen at /home/etotoni/.julia/v0.4/ParallelAccelerator/src/driver.jl:178
 in processFuncCall at /home/etotoni/.julia/v0.4/CompilerTools/src/OptFramework.jl:338
 in main at /home/etotoni/.julia/v0.4/CompilerTools/src/OptFramework.jl:400
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:304
 in process_options at ./client.jl:280
 in _start at ./client.jl:378

ninegua commented 8 years ago

squeeze is not a supported function for Domain IR at the moment. CGen translation fails cause squeeze has strings and calls throw. Squeeze is essentially a reshape operation, and CGen has trouble translating that too.

I think reshape is common enough and should be handled by CGen, and implemented as a J2C array function.

ninegua commented 8 years ago

The following code now works (I've verified its correctness against the program above):


@acc begin

function multmv(a::Array{Float64,2}, b::Array{Float64,1})
   Float64[ sum(reshape(a[i,:], size(a,2)) .* b) for i in 1:size(a,1) ]
end

function multvm(a::Array{Float64,1}, b::Array{Float64,2})
   Float64[ sum(a .* b[:,i]) for i in 1:size(b,2) ]
end

function main(iterations::Int64)
    D = 10  # Number of dimensions
    N = 100
    w = 2.0.*rand(D)-1.0
    labels = rand(N)
    points = rand(N,D)
    for i in 1:iterations
       w -= multvm((1.0./(1.0.+exp(-labels.*multmv(points,w))).-1.0).*labels,points)
    end
    w
end

end

However, when I add @inline to mutlmv and multvm, it would fail to compile. This appears to be a Julia inline issue, i.e., despite that @inline is added to the two functions, they are not inlined into main. Instead, a number of calls inside multmv and multvm are inlined (e.g., ParallelAccelerator.API.sum) despite we have explicitly marked them with @noinline.

ehsantn commented 8 years ago

I rewrote the example since the previous implementation had some issues. The code is below. I will try to replace GEMM calls to see what happens to the reductions throughout the pipeline.

using ParallelAccelerator

iter = 15

@acc function main(iterations::Int64)
    D = 3  # Number of dimensions
    N = 10

    labels = reshape(rand(N),1,N)
    points = rand(D,N)
    w = reshape(2.0.*rand(D)-1.0,1,D)

    for i in 1:iterations
       w -= ((1.0./(1.0.+exp(-labels.*(w*points))).-1.0).*labels)*points'
    end
    w
end

W = main(iter)
println(W)

ehsantn commented 8 years ago

Multiple issues with ParallelAccelerator:

The allocations of the loop are not hoisted. The created extra arrays are not deleted either, which results in memory blow up. Fixing the bug doesn't seem trivial since there are issues with previous compilation stages (equivalence class of arrays at least).
The outer matrix-vector multiply is translated to transpose+gemm, while it should be just a gemm with transpose as an argument.

ehsantn commented 8 years ago

With the recent ParallelIR optimizations I checked in, the allocation hoisting and transpose+gemm issues are resolved. The first gemm could be fused with the array operations but this is not a top priority right now.