Closed xuyuan closed 10 years ago
Can you demonstrate this bug on a POD or STL type such that it can be duplicated?
DoNotOptimizeAway takes a reference. In this case, it should take a reference to the result of the "u+v" operation and should not at all change the results or the way "u+v" is computed.
I think STL is too complicated for compiler, but for POD, compiler can do great job. I created a small example:
#include <celero/Celero.h>
#include <eigen3/Eigen/Eigen>
CELERO_MAIN;
Eigen::Vector3f u, v;
struct Vec {
float x, y, z;
};
Vec a, b;
Vec add(const Vec& a, const Vec& b) {
Vec c;
c.x = a.x + b.x;
c.y = a.y + b.y;
c.z = a.z + b.z;
return c;
}
BASELINE(DemoSimple, Baseline, 0, 7100000)
{
asm("# test eigen begin");
celero::DoNotOptimizeAway(Eigen::Vector3f(u + v));
asm("# test eigen end");
asm("# test POD begin");
celero::DoNotOptimizeAway(add(a, b));
asm("# test POD end");
}
The assembler I got from gcc 4.7 is
# 22 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
# test eigen begin
# 0 "" 2
#NO_APP
movss u(%rip), %xmm0
addss v(%rip), %xmm0
movss %xmm0, (%rsp)
call getpid
cmpl $1, %eax
je .L68
.L65:
#APP
# 24 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
# test eigen end
# 0 "" 2
# 26 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
# test POD begin
# 0 "" 2
#NO_APP
movss a(%rip), %xmm0
addss b(%rip), %xmm0
movss %xmm0, 16(%rsp)
call getpid
cmpl $1, %eax
je .L69
.L66:
#APP
# 28 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
# test POD end
With bugfix, the result is follow, so you can see the difference.
# 22 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
# test eigen begin
# 0 "" 2
#NO_APP
movss u(%rip), %xmm0
addss v(%rip), %xmm0
movss %xmm0, (%rsp)
movss u+4(%rip), %xmm0
addss v+4(%rip), %xmm0
movss %xmm0, 4(%rsp)
movss u+8(%rip), %xmm0
addss v+8(%rip), %xmm0
movss %xmm0, 8(%rsp)
call getpid
cmpl $1, %eax
je .L65
.L68:
#APP
# 24 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
# test eigen end
# 0 "" 2
# 26 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
# test POD begin
# 0 "" 2
#NO_APP
movss a+4(%rip), %xmm1
movss a+8(%rip), %xmm0
addss b+4(%rip), %xmm1
movss a(%rip), %xmm2
addss b+8(%rip), %xmm0
addss b(%rip), %xmm2
movss %xmm1, 20(%rsp)
movss %xmm0, 24(%rsp)
movss %xmm2, 16(%rsp)
call getpid
cmpl $1, %eax
je .L71
.L67:
#APP
# 28 "/home/xu/projects/Celero/examples/bug_report.cpp" 1
# test POD end
Acknowledged. I see there is a problem here. I am checking in a fix. The fix for Visual Studio is not as nice as for gcc & clang, but I believe it addresses this issue. Thanks for the bug report!
Thanks for the great project first.
The celero::DoNotOptimizeAway only cheats the compiler with calling putchar on the first char in data. But the compiler (at least GCC 4.7) is smart enough to keep calculation of first char and optimize other parts away.
Example: Vector3 u, v;
celero::DoNotOptimizeAway(u + v);
the compiler will only calculate u[0] + v[0], and ignore u[1] + v[1] and u[2] + v[2]
this can be checked by generated asm code.
I have a dirty fix: