code with celero::DoNotOptimizeAway can be partly optimized away

xuyuan commented 10 years ago

Thanks for the great project first.

The celero::DoNotOptimizeAway only cheats the compiler with calling putchar on the first char in data. But the compiler (at least GCC 4.7) is smart enough to keep calculation of first char and optimize other parts away.

Example: Vector3 u, v;

celero::DoNotOptimizeAway(u + v);

the compiler will only calculate u[0] + v[0], and ignore u[1] + v[1] and u[2] + v[2]

this can be checked by generated asm code.

I have a dirty fix:

template<class T> void _dump_to_std(T&& datum) {
    char* p = static_cast<char*>(static_cast<void*>(&datum));
    for (size_t i=0; i<sizeof(T)/sizeof(char); ++i) {
        putchar(*p);
        p++;
    }
}

///
/// \func DoNotOptimizeAway
///
/// \author Andrei Alexandrescu
///
template<class T> void DoNotOptimizeAway(T&& datum)
{
    #ifdef WIN32
    if(_getpid() == 1) 
    #else
    if(getpid() == 1) 
    #endif
    {
        _dump_to_std(datum);
    }
}

DigitalInBlue commented 10 years ago

Can you demonstrate this bug on a POD or STL type such that it can be duplicated?

DoNotOptimizeAway takes a reference. In this case, it should take a reference to the result of the "u+v" operation and should not at all change the results or the way "u+v" is computed.

xuyuan commented 10 years ago

I think STL is too complicated for compiler, but for POD, compiler can do great job. I created a small example:

#include <celero/Celero.h>
#include <eigen3/Eigen/Eigen>

CELERO_MAIN;

Eigen::Vector3f u, v;
struct Vec {
  float x, y, z;
};
Vec a, b;

Vec add(const Vec& a, const Vec& b) {
  Vec c;
  c.x = a.x + b.x;
  c.y = a.y + b.y;
  c.z = a.z + b.z;
  return c;
}

BASELINE(DemoSimple, Baseline, 0, 7100000)
{
  asm("# test eigen begin");
  celero::DoNotOptimizeAway(Eigen::Vector3f(u + v));
  asm("# test eigen end");

  asm("# test POD begin");
  celero::DoNotOptimizeAway(add(a, b));
  asm("# test POD end");
}

The assembler I got from gcc 4.7 is

# 22 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
    # test eigen begin
# 0 "" 2
#NO_APP
    movss   u(%rip), %xmm0
    addss   v(%rip), %xmm0
    movss   %xmm0, (%rsp)
    call    getpid
    cmpl    $1, %eax
    je  .L68
.L65:
#APP
# 24 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
    # test eigen end
# 0 "" 2
# 26 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
    # test POD begin
# 0 "" 2
#NO_APP
    movss   a(%rip), %xmm0
    addss   b(%rip), %xmm0
    movss   %xmm0, 16(%rsp)
    call    getpid
    cmpl    $1, %eax
    je  .L69
.L66:
#APP
# 28 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
    # test POD end

With bugfix, the result is follow, so you can see the difference.

# 22 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
    # test eigen begin
# 0 "" 2
#NO_APP
    movss   u(%rip), %xmm0
    addss   v(%rip), %xmm0
    movss   %xmm0, (%rsp)
    movss   u+4(%rip), %xmm0
    addss   v+4(%rip), %xmm0
    movss   %xmm0, 4(%rsp)
    movss   u+8(%rip), %xmm0
    addss   v+8(%rip), %xmm0
    movss   %xmm0, 8(%rsp)
    call    getpid
    cmpl    $1, %eax
    je  .L65
.L68:
#APP
# 24 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
    # test eigen end
# 0 "" 2
# 26 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
    # test POD begin
# 0 "" 2
#NO_APP
    movss   a+4(%rip), %xmm1
    movss   a+8(%rip), %xmm0
    addss   b+4(%rip), %xmm1
    movss   a(%rip), %xmm2
    addss   b+8(%rip), %xmm0
    addss   b(%rip), %xmm2
    movss   %xmm1, 20(%rsp)
    movss   %xmm0, 24(%rsp)
    movss   %xmm2, 16(%rsp)
    call    getpid
    cmpl    $1, %eax
    je  .L71
.L67:
#APP
# 28 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
    # test POD end

DigitalInBlue commented 10 years ago

Acknowledged. I see there is a problem here. I am checking in a fix. The fix for Visual Studio is not as nice as for gcc & clang, but I believe it addresses this issue. Thanks for the bug report!

DigitalInBlue / Celero

code with celero::DoNotOptimizeAway can be partly optimized away #13