Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Missed optimization with "static const" local variable initialized with function #19289

Open Quuxplusone opened 10 years ago

Quuxplusone commented 10 years ago
Bugzilla Link PR19290
Status NEW
Importance P normal
Reported by jonathan.sauer@gmx.de
Reported on 2014-03-31 06:49:35 -0700
Last modified on 2014-05-07 17:10:27 -0700
Version trunk
Hardware PC All
CC hfinkel@anl.gov, llvm-bugs@lists.llvm.org, nlewycky@google.com, richard-llvm@metafoo.co.uk, rnk@google.com
Fixed by commit(s)
Attachments clang.s (1515 bytes, application/octet-stream)
Blocks
Blocked by
See also
Created attachment 12311
Complete bitcode output by clang for first program fragment

Take the following program fragment:

static int foo()
{
  return 23;
}

int bar()
{
  static const int FOO = foo();

  return FOO;
}

When compiled with clang -r205174, -O3 and without thread-safe statics:

% ~/LLVM/build/Release+Asserts/bin/clang++ -S -emit-llvm -O3 -fno-threadsafe-
statics clang.cpp

It is compiled to this (full output attached):

define i32 @_Z3barv() #0 {
entry:
  %.b = load i1* @_ZGVZ3barvE3FOO, align 1
  br i1 %.b, label %init.end, label %init.check

init.check:                                       ; preds = %entry
  store i32 23, i32* @_ZZ3barvE3FOO, align 4, !tbaa !1
  %0 = tail call {}* @llvm.invariant.start(i64 4, i8* bitcast (i32* @_ZZ3barvE3FOO to i8*))
  store i1 true, i1* @_ZGVZ3barvE3FOO, align 1
  br label %init.end

init.end:                                         ; preds = %entry, %init.check
  %1 = load i32* @_ZZ3barvE3FOO, align 4, !tbaa !1
  ret i32 %1
}

As can be seen, despite the fact that "FOO" is initialized with a constant
value of 23, this initialization does not happen at compile-time or at least at
run-time during program startup. Instead there is a flag ("_ZGVZ3barvE3FOO")
that is checked each time <bar> is called to check if "FOO" has already been
initialized.

(With thread-safe statics there is additional code to make sure the flag is
checked and set in a thread-safe manner.)

When marking "foo" as "constexpr", the generated code is as expected:

define i32 @_Z3barv() #0 {
entry:
  ret i32 23
}

However changing the function to "constexpr" is not always a possible solution,
e.g. when using external headers such as xmmintrin.h:

#include <xmmintrin.h>

float baz()
{
  static const __m128 one = _mm_set_ss(1.0f);

  return _mm_cvtss_f32(one);
}

Results in:

define float @_Z3bazv() #0 {
entry:
  %.b = load i1* @_ZGVZ3bazvE3one, align 1
  br i1 %.b, label %init.end, label %init.check

init.check:                                       ; preds = %entry
  store <4 x float> <float 1.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00>, <4 x float>* @_ZZ3bazvE3one, align 16, !tbaa !1
  %0 = tail call {}* @llvm.invariant.start(i64 16, i8* bitcast (<4 x float>* @_ZZ3bazvE3one to i8*))
  store i1 true, i1* @_ZGVZ3bazvE3one, align 1
  br label %init.end

init.end:                                         ; preds = %entry, %init.check
  %1 = load <4 x float>* @_ZZ3bazvE3one, align 16, !tbaa !1
  %vecext.i = extractelement <4 x float> %1, i32 0
  ret float %vecext.i
}

However, changing the code to the equivalent (essentially inlining _mm_set_ss):

float baz()
{
  static const __m128 one = { 1.0f, 0.0f, 0.0f, 0.0f };

  return _mm_cvtss_f32(one);
}

Results in the expected:

define float @_Z3bazv() #0 {
entry:
  ret float 1.000000e+00
}
Quuxplusone commented 10 years ago

Attached clang.s (1515 bytes, application/octet-stream): Complete bitcode output by clang for first program fragment

Quuxplusone commented 10 years ago

globalopt is good at this sort of thing, but I think the dependence on the guard variable is too complex for it at the moment.