Closed ds283 closed 11 years ago
This could be a limitation of OpenCL implementation. You could try to use tagged terminals for x
, M_phi
, M_chi
, __Mp
in order to reduce the resulting expression tree size and number of input parameters. This should also improve performance, so I would recommend doing this anyway:
#define phi (vex::tag<0>(x(0)))
#define chi (vex::tag<1>(x(1)))
#define __dphi (vex::tag<2>(x(2)))
#define __dchi (vex::tag<3>(x(3)))
auto t_phi = vex::tag<4>(M_phi);
auto t_chi = vex::tag<5>(M_chi);
auto t_p = vex::tag<6>(__Mp);
auto one = vex::tag<7>(1.0);
auto three = vex::tag<8>(3.0);
auto fifty = vex::tag<9>(50.0);
dxdt(3) = three*( three*sin(one/(t_chi*t_chi*t_chi)*(chi*chi*chi))*((t_p*t_p)*
(t_p*t_p))/(t_chi*t_chi*t_chi)*(chi*chi)+-100.0*(t_chi*t_chi)*chi)*
(t_p*t_p)/( fifty*(t_phi*t_phi)*(phi*phi)+fifty*(__dphi*__dphi)+fifty*
(__dchi*__dchi)+((t_p*t_p)*(t_p*t_p))*cos(one/(t_chi*t_chi*t_chi)*
(chi*chi*chi))+fifty*(t_chi*t_chi)*(chi*chi))+three*__dchi*( one/( 6.0*( fifty*
(t_phi*t_phi)*(phi*phi)+((t_p*t_p)*(t_p*t_p))*cos(one/(t_chi*t_chi*t_chi)*
(chi*chi*chi))+fifty*(t_chi*t_chi)*(chi*chi))*(t_p*t_p)/( fifty*(t_phi*t_phi)*
(phi*phi)+fifty*(__dphi*__dphi)+fifty*(__dchi*__dchi)+((t_p*t_p)*
(t_p*t_p))*cos(one/(t_chi*t_chi*t_chi)*(chi*chi*chi))+fifty*(t_chi*t_chi)*
(chi*chi))+(__dphi*__dphi)+(__dchi*__dchi))*( (__dphi*__dphi)+(__dchi*__dchi))-one);
You can check the effect of this by defining VEXCL_SHOW_KERNELS
macro and comparing the generated sources for the OpenCL kernel in question.
I could not reproduce the issue on my Linux machine for either of the variants. But I noticed that generated kernel name may become quite long:
kernel void plus_divides_multiplies_multiplies_term_plus_multiplies_divides_multiplies_multiplies_term_sin_multiplies_divides_term_multiplies_multiplies_term_term_term_multiplies_multiplies_term_term_term_multiplies_multiplies_term_term_multiplies_term_term_multiplies_multiplies_term_term_term_multiplies_term_term_multiplies_multiplies_term_multiplies_term_term_term_multiplies_term_term_plus_plus_plus_plus_multiplies_multiplies_term_multiplies_term_term_multiplies_term_term_multiplies_term_multiplies_term_term_multiplies_term_multiplies_term_term_multiplies_multiplies_multiplies_term_term_multiplies_term_term_cos_multiplies_divides_term_multiplies_multiplies_term_term_term_multiplies_multiplies_term_term_term_multiplies_multiplies_term_multiplies_term_term_multiplies_term_term_multiplies_multiplies_term_term_minus_multiplies_divides_term_plus_plus_divides_multiplies_multiplies_term_plus_plus_multiplies_multiplies_term_multiplies_term_term_multiplies_term_term_multiplies_multiplies_multiplies_term_term_multiplies_term_term_cos_multiplies_divides_term_multiplies_multiplies_term_term_term_multiplies_multiplies_term_term_term_multiplies_multiplies_term_multiplies_term_term_multiplies_term_term_multiplies_term_term_plus_plus_plus_plus_multiplies_multiplies_term_multiplies_term_term_multiplies_term_term_multiplies_term_multiplies_term_term_multiplies_term_multiplies_term_term_multiplies_multiplies_multiplies_term_term_multiplies_term_term_cos_multiplies_divides_term_multiplies_multiplies_term_term_term_multiplies_multiplies_term_term_term_multiplies_multiplies_term_multiplies_term_term_multiplies_term_term_multiplies_term_term_multiplies_term_term_plus_multiplies_term_term_multiplies_term_term_term_(
ulong n,
global double * prm_1_1,
double prm_1_2,
double prm_1_4,
double prm_1_5,
global double * prm_1_8,
double prm_1_11,
double prm_1_20,
double prm_1_26,
double prm_1_27,
global double * prm_1_29,
global double * prm_1_32,
global double * prm_1_35,
double prm_1_56
)
{
for(size_t idx = get_global_id(0); idx < n; idx += get_global_size(0)) {
prm_1_1[idx] = ( ( ( ( prm_1_2 * ( ( ( ( ( prm_1_2 * sin( ( ( prm_1_4 / ( ( prm_1_5 * prm_1_5 ) * prm_1_5 ) ) * ( ( prm_1_8[idx] * prm_1_8[idx] ) * prm_1_8[idx] ) ) ) ) * ( ( prm_1_11 * prm_1_11 ) * ( prm_1_11 * prm_1_11 ) ) ) / ( ( prm_1_5 * prm_1_5 ) * prm_1_5 ) ) * ( prm_1_8[idx] * prm_1_8[idx] ) ) + ( ( prm_1_20 * ( prm_1_5 * prm_1_5 ) ) * prm_1_8[idx] ) ) ) * ( prm_1_11 * prm_1_11 ) ) / ( ( ( ( ( ( prm_1_26 * ( prm_1_27 * prm_1_27 ) ) * ( prm_1_29[idx] * prm_1_29[idx] ) ) + ( prm_1_26 * ( prm_1_32[idx] * prm_1_32[idx] ) ) ) + ( prm_1_26 * ( prm_1_35[idx] * prm_1_35[idx] ) ) ) + ( ( ( prm_1_11 * prm_1_11 ) * ( prm_1_11 * prm_1_11 ) ) * cos( ( ( prm_1_4 / ( ( prm_1_5 * prm_1_5 ) * prm_1_5 ) ) * ( ( prm_1_8[idx] * prm_1_8[idx] ) * prm_1_8[idx] ) ) ) ) ) + ( ( prm_1_26 * ( prm_1_5 * prm_1_5 ) ) * ( prm_1_8[idx] * prm_1_8[idx] ) ) ) ) + ( ( prm_1_2 * prm_1_35[idx] ) * ( ( ( prm_1_4 / ( ( ( ( ( prm_1_56 * ( ( ( ( prm_1_26 * ( prm_1_27 * prm_1_27 ) ) * ( prm_1_29[idx] * prm_1_29[idx] ) ) + ( ( ( prm_1_11 * prm_1_11 ) * ( prm_1_11 * prm_1_11 ) ) * cos( ( ( prm_1_4 / ( ( prm_1_5 * prm_1_5 ) * prm_1_5 ) ) * ( ( prm_1_8[idx] * prm_1_8[idx] ) * prm_1_8[idx] ) ) ) ) ) + ( ( prm_1_26 * ( prm_1_5 * prm_1_5 ) ) * ( prm_1_8[idx] * prm_1_8[idx] ) ) ) ) * ( prm_1_11 * prm_1_11 ) ) / ( ( ( ( ( ( prm_1_26 * ( prm_1_27 * prm_1_27 ) ) * ( prm_1_29[idx] * prm_1_29[idx] ) ) + ( prm_1_26 * ( prm_1_32[idx] * prm_1_32[idx] ) ) ) + ( prm_1_26 * ( prm_1_35[idx] * prm_1_35[idx] ) ) ) + ( ( ( prm_1_11 * prm_1_11 ) * ( prm_1_11 * prm_1_11 ) ) * cos( ( ( prm_1_4 / ( ( prm_1_5 * prm_1_5 ) * prm_1_5 ) ) * ( ( prm_1_8[idx] * prm_1_8[idx] ) * prm_1_8[idx] ) ) ) ) ) + ( ( prm_1_26 * ( prm_1_5 * prm_1_5 ) ) * ( prm_1_8[idx] * prm_1_8[idx] ) ) ) ) + ( prm_1_32[idx] * prm_1_32[idx] ) ) + ( prm_1_35[idx] * prm_1_35[idx] ) ) ) * ( ( prm_1_32[idx] * prm_1_32[idx] ) + ( prm_1_35[idx] * prm_1_35[idx] ) ) ) - prm_1_4 ) ) );
}
}
I wonder if this could be the reason for the crash with Apple implementation. In case terminal tagging does not help, could you please try 547299a? It assigns same name to all autogenerated kernels.
Thanks for the very helpful responses.
Adding tags did not help, although it did reduce the size of the generated kernels. But switching to a fixed kernel name apparently solves the problem - I now get correct output.
I've pushed the constant names commit to the master branch. I'll do some cleanup later today, but it should be usable right away.
Apologies if this is a dumb question.
I am trying to use VexCL with odeint to solve a set of ODEs for different parameter values. The ODEs are produced by another program which takes an algebraic expression as input, performs some symbolic manipulations, and writes out the ODEs as C++ which is then compiled. Because they are produced automatically, the symbolic expressions specifying the system of ODEs can become lengthy.
The specific example causing trouble is (all one expression)
where
phi
,chi
,__dphi
andd_chi
are macros resolving to the components of a multivectorx
of typevex::multivector<double>
representing the current state, andM_phi
,M_chi
and__Mp
aredouble
quantities held on the host.This compiles fine, but the OpenCL kernel representing the right-hand side fails when executed
I am using Apple LLVM 4.2 on OS X and an up-to-date version of VexCL.
If I change this by breaking the right-hand side in two (but making no other changes), to give
then the resulting code works as expected and produces a correct result.
I presume that there is a limit to the size of expressions which can be handled correctly by the
vex::multivector
expression templates. Is this correct? If so, is there any way to predict in advance what is the longest expression which can be handled?