c42f / tinyformat

Minimal, type safe printf replacement library for C++
531 stars 75 forks source link

Segmentation Fault with Intel Compiler #58

Closed dakep closed 4 years ago

dakep commented 4 years ago

I came across a very strange problem in tinyformat v2.2.0. Even the most trivial of examples

#include "tinyformat.h"

int main(int argc, char** argv) {
  auto str = tfm::format("Got %d arguments", argc);
  return str.size();
}

causes a segfault when compiling with icpc -std=gnu++11 -O0. There is no segfault when compiling with any optimization level greater than 0. The backtrace is

#0  0x00007fffffff9808 in ?? ()
#1  0x000000000040216f in tinyformat::detail::FormatArg::format (this=0x7fffffff9768, out=..., fmtBegin=0x404738 "%d arguments", fmtEnd=0x40473a " arguments", ntrunc=-1) at tinyformat.h:513
#2  0x0000000000403994 in tinyformat::detail::formatImpl (out=..., fmt=0x404738 "%d arguments", formatters=0x7fffffff9768, numFormatters=1) at tinyformat.h:812
#3  0x0000000000403ec6 in tinyformat::vformat (out=..., fmt=0x404734 "Got %d arguments", list=...) at tinyformat.h:956
#4  0x0000000000403fb5 in tinyformat::format<int> (out=..., fmt=0x404734 "Got %d arguments", args=@0x7fffffff99a8: 1) at tinyformat.h:966
#5  0x0000000000403f49 in tinyformat::format<int> (fmt=0x404734 "Got %d arguments", args=@0x7fffffff99a8: 1) at tinyformat.h:975
#6  0x00000000004018e2 in main (argc=1, argv=0x7fffffff9ac8) at tfm_bug.cc:4

I tried with two different versions of icpc

c42f commented 4 years ago

This is weird. Can you compile with debug info (-g I presume on icc), run with gdb and show the results of info locals and info args? I'm not sure what to make of that instruction pointer at 0x00007fffffff9808 either. That sure does seem more like a data address when compared to the value of this etc...

dakep commented 4 years ago

Yes, the address looks more like a data address. Here's what I get from gdb:

Breakpoint 1, tinyformat::detail::FormatArg::format (this=0x7fffffffa658, out=..., fmtBegin=0x404ae8 "%d arguments", fmtEnd=0x404aea " arguments", ntrunc=-1) at tinyformat.h:517
517             TINYFORMAT_ASSERT(m_value);
(gdb) info locals
No locals.
(gdb) info args
this = 0x7fffffffa658
out = @0x7fffffffa6f8: <incomplete type>
fmtBegin = 0x404ae8 "%d arguments"
fmtEnd = 0x404aea " arguments"
ntrunc = -1
(gdb) p *this
$4 = {m_value = 0x404222 <tinyformat::vformat(std::ostream&, char const*, tinyformat::FormatList const&)+62>, m_formatImpl = 0x7fffffffa6f8, m_toIntImpl = 0x404ae4}
(gdb) p tinyformat::detail::FormatArg::formatImpl<int>
$3 = {void (std::ostream &, const char *, const char *, int, const void *)} 0x402244 <tinyformat::detail::FormatArg::formatImpl<int>(std::ostream&, char const*, char const*, int, void const*)>

The correct formatImpl function is compiled into the program, but for whatever reason, the address is not properly stored in m_formatImpl. Another weird thing is that the address stored in m_value seems to point to a function?!

When I step through the program, the addresses seem to be correct inside the constructors tinyformat::detail::FormatArg::FormatArg and tinyformat::detail::FormatListN<1>::FormatListN:

(gdb) bt
#0  tinyformat::detail::FormatArg::FormatArg<int> (this=0x7fffffffa658, value=@0x7fffffffa898: 1) at tinyformat.h:512
#1  0x00000000004022fe in tinyformat::detail::FormatListN<1>::FormatListN<int> (this=0x7fffffffa648, args=@0x7fffffffa898: 1) at tinyformat.h:966
#2  0x0000000000404337 in tinyformat::makeFormatList<int> (args=@0x7fffffffa898: 1) at tinyformat.h:1015
#3  0x00000000004042f7 in tinyformat::format<int> (out=..., fmt=0x404ae4 "Got %d arguments", args=@0x7fffffffa898: 1) at tinyformat.h:1051
#4  0x00000000004042a5 in tinyformat::format<int> (fmt=0x404ae4 "Got %d arguments", args=@0x7fffffffa898: 1) at tinyformat.h:1060
#5  0x00000000004018e2 in main (argc=1, argv=0x7fffffffa9b8) at tfmbug.cc:4
(gdb) p *this
$12 = {m_value = 0x7fffffffa898, m_formatImpl = 0x402244 <tinyformat::detail::FormatArg::formatImpl<int>(std::ostream&, char const*, char const*, int, void const*)>, 
  m_toIntImpl = 0x402298 <tinyformat::detail::FormatArg::toIntImpl<int>(void const*)>}
(gdb) s
tinyformat::detail::FormatListN<1>::FormatListN<int> (this=0x7fffffffa648, args=@0x7fffffffa898: 1) at tinyformat.h:967
967         { static_assert(sizeof...(args) == N, "Number of args must be N"); }
(gdb) p *this
$15 = {<tinyformat::FormatList> = {m_args = 0x7fffffffa658, m_N = 1}, m_formatterStore = {{m_value = 0x7fffffffa898, 
      m_formatImpl = 0x402244 <tinyformat::detail::FormatArg::formatImpl<int>(std::ostream&, char const*, char const*, int, void const*)>, 
      m_toIntImpl = 0x402298 <tinyformat::detail::FormatArg::toIntImpl<int>(void const*)>}}}

Once I step out of the tinyformat::detail::FormatListN constructor, however, something seems off. Inside tinyformat::vformat I have

(gdb) p *list.m_args
$18 = {m_value = 0x7fffffffa898, m_formatImpl = 0x7fffffffa6f8, m_toIntImpl = 0x404ae4}

and in the next step in tinyformat::detail::formatImpl I get

(gdb) p *args
$17 = {m_value = 0x404222 <tinyformat::vformat(std::ostream&, char const*, tinyformat::FormatList const&)+62>, m_formatImpl = 0x7fffffffa6f8, m_toIntImpl = 0x404ae4}

I don't know why, but something seems to be going wrong when assigning to FormatListRef. When I debug using -O1 optimizations, everything works as expected and the value of args in tinyformat::detail::formatImpl is correct:

(gdb) print *args
$2 = {m_value = 0x7fffffffa8a8, m_formatImpl = 0x403010 <tinyformat::detail::FormatArg::formatImpl<int>(std::ostream&, char const*, char const*, int, void const*)>, 
  m_toIntImpl = 0x4035e4 <tinyformat::detail::FormatArg::toIntImpl<int>(void const*)>}
c42f commented 4 years ago

Wow, great sleuthing. That's exactly the information we need (I can't track this down as I don't have an icc compiler license, though I applied for one.)

To me this is looking rather like a bizarre compiler bug. The other option is that I'm committing some obscure form of UB somewhere by creating my own quasi-vtable-like thing with the m_formatImpl pointer.

At this point I think the best option is to progressively minimize tinyformat.h by deleting functionality until you've got a minimal reproduction. Then we might discover exactly what triggers this.

Yes, the address looks more like a data address. Here's what I get from gdb:

...
(gdb) p *this
$4 = {m_value = 0x404222 <tinyformat::vformat(std::ostream&, char const*, tinyformat::FormatList const&)+62>, m_formatImpl = 0x7fffffffa6f8, m_toIntImpl = 0x404ae4}
(gdb) p tinyformat::detail::FormatArg::formatImpl<int>
$3 = {void (std::ostream &, const char *, const char *, int, const void *)} 0x402244 <tinyformat::detail::FormatArg::formatImpl<int>(std::ostream&, char const*, char const*, int, void const*)>

It looks like the pointer m_value contains the pointer to m_formatImpl here??

Another option for debugging here might be to just printf the values of these pointers from inside Format::format(). (Are we sure that gdb is not lying to us here :-) ?)

dakep commented 4 years ago

I created a small example (I don't know if it's actually minimal but it should be close) to recreate the problem:

class FormatArg {
 public:
  FormatArg(const int& value)
    : m_value(static_cast<const void*>(&value)), m_formatImpl(&formatImpl) {}

  void format() const {
      m_formatImpl(m_value);
  }

 private:
  static void formatImpl(const void* value) {
    const int true_val = *static_cast<const int*>(value);
  }

  const void* m_value;
  void (*m_formatImpl)(const void* value);
};

class FormatList {
 public:
  FormatList(FormatArg* args) : m_args(args) {}
  friend int main(int argc, char** argv);

 private:
  const FormatArg* m_args;
  int m_N;
};

class FormatList1 : public FormatList {
 public:
  FormatList1(const int& arg) : FormatList(&m_formatterStore[0]),
      m_formatterStore{ FormatArg(arg) } {}

 private:
  FormatArg m_formatterStore[1];
};

FormatList1 makeFormatList(const int& arg) {
  return FormatList1(arg);
}

int main(int argc, char** argv) {
  // This gives a segfault when using icpc
  makeFormatList(argc).m_args[0].format();
  // This would be fine...
  // FormatList1(argc).m_args[0].format();
  return 0;
}

The segfault is introduced in the makeFormatList function. I assume that return value optimization "fixes" the problem for icpc, that's why there's no segfault for -O1 and up.

I can't find any UB in this example, but that doesn't mean anything. However, a further indication of a compiler problem is that when you omit FormatList::m_N from the class definition, there's no segfault.

c42f commented 4 years ago

I can't see any problem with this code either. I think this is minimal enough that you could report it upstream to icc and see what happens.

when you omit FormatList::m_N from the class definition, there's no segfault

If you can find a workaround you can put behind a small #ifdef __INTEL_COMPILER (maybe injecting some dummy arguments into the class or some such) I'd be happy to merge that. If not, I'm not sure what else we can do here.

dakep commented 4 years ago

I posted the issue on the Intel Developer Forum and someone spotted the problem. The function makeFormatList creates a temporary object on the stack. The returned copy still holds a pointer to the memory allocated for the temporary object, which is invalid when the temporary goes out of scope.

To fix the issue, detail::FormatListN simply needs a copy constructor. I created PR #59.

c42f commented 4 years ago

Great! A big thanks to whoever pointed that out, and thanks for communicating it back here.

I tried clang's -fsanitize=undefined, but unfortunately it can't see the error because the constructor is elided even with -O0. It's necessary to use the -fno-elide-constructors option which then reproduces this issue with both g++ and clang++. -fsanitize=undefined reports the problem as a stack overflow though which it isn't. According to the g++ docs, C++17 compilers are now actually required to elide the copy constructor so icc may also change the default in the future.

c42f commented 4 years ago

Actually there's another way we could have debugged this which I should have thought of - it's possible to get valgrind to report the read of uninitialized memory. (Though for g++ only if -fno-elide-constructors is passed explicitly, which require knowing what the error was.)

g++ -Wall -O0 -g -fno-elide-constructors -fno-omit-frame-pointer tinyformat_test.cpp -o a.out
valgrind --tool=memcheck --track-origins=yes ./a.out
dakep commented 4 years ago

Thanks for merging the fix. At least now I've learned that gcc & clang elide constructors even if setting -O0! Probably something I'll check for more carefully in the future.

c42f commented 4 years ago

Yes, that was surprising to me. It seems with C++17 that the language has moved to this being considered a feature of the language rather than an optimization.