dataflow_subfunc_ocl example and dataflow pragma

I was checking the mentioned example and after HW_EMU build, I noticed that (II) for the whole kernel is 137 clocks. It is way more than the interval of the same kernel WITHOUT dataflow pragma.

Link to the kernel file

If the purpose of this example is to show how dataflow pragma could decrease (II), it might be useful to add dataflow pragma on the top function of the kernel itself.

__attribute__ ((reqd_work_group_size(1, 1, 1)))
__attribute__ ((xcl_dataflow))
void adder(__global int *in, __global int *out, int inc, int size)
{
    run_subfunc(in, out, inc, size);
}

With dataflow pragma on top function of the kernel I get: Latency=137, Interval=2 Without dataflow pragma on top function of the kernel I get: Latency=138, Inteval=138 (master branch without any edits on the kernel) Without dataflow pragma on any of the top or sub functions I get: Latency=6, Inteval=6

Xilinx / SDAccel_Examples

dataflow_subfunc_ocl example and dataflow pragma #44