IntelLabs / ParallelAccelerator.jl

The ParallelAccelerator package, part of the High Performance Scripting project at Intel Labs
BSD 2-Clause "Simplified" License
294 stars 32 forks source link

CGen for HPAT broken #80

Closed ehsantn closed 8 years ago

ehsantn commented 8 years ago

CGen changes of commit 08216e87e276a42172d0a35d728a19592925781e break HPAT.

ninegua commented 8 years ago

Can we have some test cases to guard against future breakage?

ehsantn commented 8 years ago

Ideally, we should test both HPAT and Latte. We can set up travis to do that.

ehsantn commented 8 years ago

The problem was that HPAT generates macro-level math operators like (:*) that were commented out. This is probably not very good but helped fast development and debugging of generated code. We should have HPAT generate proper backend code.

I commented the operators back in for now. Does this cause any issue for our applications?

ninegua commented 8 years ago

Operators like * are overloaded a lot in base library, and it is wrong to just translate it literally into * in C. In commit 08216e8, CGen will try to do the correct thing trying to translate the corresponding method of * based on the types of its arguments. If that is not working for you, please file a detail ticket instead of putting wrong code back. Thanks.

ehsantn commented 8 years ago

sure.

ehsantn commented 8 years ago

The operator code is back out. HPAT doesn't depend on the operators anymore. However, CGen is not able to translate even a simple integer multiply with *. This might not be needed for most codes since operators are usually lowered in code_typed().

ninegua commented 8 years ago

Here is how I tested it:

function times(x,y)
  x * y
end

code=code_typed(times,(Int,Int))
opr = code[1].args[3].args[2].args[1].args[3].args[1]
println("Changing ", opr, " :: ", typeof(opr), " to ", "*")
code[1].args[3].args[2].args[1].args[3].args[1] = GlobalRef(Base, :*)
println("Remove box call")
code[1].args[3].args[2].args[1] = code[1].args[3].args[2].args[1].args[3]
println(code)

using ParallelAccelerator
ParallelAccelerator.CGen.set_debug_level(3)
ParallelAccelerator.CGen.from_root_entry(code[1], "times")

I see no error, and the generated program looks correct.

ninegua commented 8 years ago

Now, if we change from Int to Array{Float64,2} and use the following program, it would produce wrong code:

@noinline function times(x,y)
  x * y
end

function f(x)
  times(x,x)
end

code=code_typed(f,(Array{Float64,2},))
code[1].args[3].args[2].args[1].args[1]=GlobalRef(Base, :*)
println(code)

using ParallelAccelerator.CGen
CGen.set_debug_level(3)
fname=CGen.writec(CGen.from_root_entry(code[1], "f"))

The generated C code contains this function, where the last line is wrong:

 j2c_array< double >  _Base_star( j2c_array< double > &  A,  j2c_array< double > &  B)
{
TypeFloat64 TS;
TupleInt64Int64 GenSym0;
 j2c_array< double >  GenSym1;
TS = {};
GenSym0 = {A.ARRAYSIZE(1), B.ARRAYSIZE(2)};
GenSym1 = j2c_array<double>::new_j2c_array_2d(NULL, GenSym0.f0, GenSym0.f1);
return GenSym1; cgen_cblas_dgemm(false, false, A.ARRAYSIZE(1),B.ARRAYSIZE(2),A.ARRAYSIZE(2), A.data, A.ARRAYSIZE(1), B.data, B.ARRAYSIZE(1), GenSym1.data, A.ARRAYSIZE(1));

}

It may appear that the problem is in cgen-patern-match.jl:168, and changing it to produce sensible nested expression would solve the problem. It might work for the above program, but the real issue is deeper.

In particular, one occasion that I ran into was that some function call takes an argument that is a jl_new_array call. So nested translation would put j2c_array<double>::new_j2c_array_2d in its place. But actually compiling this kind of C++ code would result in C++ compiler complaining that it expects a L-value in the argument position because the callee has j2c_array<double> &x in its parameter position.

So this is what I reported in #74. That is, CGen needs to normalize its input AST before it can translate things correctly. So there was indeed a long story behind what might seemed like a relatively few lines change in commit 08216e8. It was only a first step towards fixing a long standing issue (#74).