Closed dakep closed 4 years ago
This is weird. Can you compile with debug info (-g
I presume on icc), run with gdb and show the results of info locals
and info args
? I'm not sure what to make of that instruction pointer at 0x00007fffffff9808 either. That sure does seem more like a data address when compared to the value of this
etc...
Yes, the address looks more like a data address. Here's what I get from gdb:
Breakpoint 1, tinyformat::detail::FormatArg::format (this=0x7fffffffa658, out=..., fmtBegin=0x404ae8 "%d arguments", fmtEnd=0x404aea " arguments", ntrunc=-1) at tinyformat.h:517
517 TINYFORMAT_ASSERT(m_value);
(gdb) info locals
No locals.
(gdb) info args
this = 0x7fffffffa658
out = @0x7fffffffa6f8: <incomplete type>
fmtBegin = 0x404ae8 "%d arguments"
fmtEnd = 0x404aea " arguments"
ntrunc = -1
(gdb) p *this
$4 = {m_value = 0x404222 <tinyformat::vformat(std::ostream&, char const*, tinyformat::FormatList const&)+62>, m_formatImpl = 0x7fffffffa6f8, m_toIntImpl = 0x404ae4}
(gdb) p tinyformat::detail::FormatArg::formatImpl<int>
$3 = {void (std::ostream &, const char *, const char *, int, const void *)} 0x402244 <tinyformat::detail::FormatArg::formatImpl<int>(std::ostream&, char const*, char const*, int, void const*)>
The correct formatImpl function is compiled into the program, but for whatever reason, the address is not properly stored in m_formatImpl. Another weird thing is that the address stored in m_value seems to point to a function?!
When I step through the program, the addresses seem to be correct inside the constructors tinyformat::detail::FormatArg::FormatArg and tinyformat::detail::FormatListN<1>::FormatListN
(gdb) bt
#0 tinyformat::detail::FormatArg::FormatArg<int> (this=0x7fffffffa658, value=@0x7fffffffa898: 1) at tinyformat.h:512
#1 0x00000000004022fe in tinyformat::detail::FormatListN<1>::FormatListN<int> (this=0x7fffffffa648, args=@0x7fffffffa898: 1) at tinyformat.h:966
#2 0x0000000000404337 in tinyformat::makeFormatList<int> (args=@0x7fffffffa898: 1) at tinyformat.h:1015
#3 0x00000000004042f7 in tinyformat::format<int> (out=..., fmt=0x404ae4 "Got %d arguments", args=@0x7fffffffa898: 1) at tinyformat.h:1051
#4 0x00000000004042a5 in tinyformat::format<int> (fmt=0x404ae4 "Got %d arguments", args=@0x7fffffffa898: 1) at tinyformat.h:1060
#5 0x00000000004018e2 in main (argc=1, argv=0x7fffffffa9b8) at tfmbug.cc:4
(gdb) p *this
$12 = {m_value = 0x7fffffffa898, m_formatImpl = 0x402244 <tinyformat::detail::FormatArg::formatImpl<int>(std::ostream&, char const*, char const*, int, void const*)>,
m_toIntImpl = 0x402298 <tinyformat::detail::FormatArg::toIntImpl<int>(void const*)>}
(gdb) s
tinyformat::detail::FormatListN<1>::FormatListN<int> (this=0x7fffffffa648, args=@0x7fffffffa898: 1) at tinyformat.h:967
967 { static_assert(sizeof...(args) == N, "Number of args must be N"); }
(gdb) p *this
$15 = {<tinyformat::FormatList> = {m_args = 0x7fffffffa658, m_N = 1}, m_formatterStore = {{m_value = 0x7fffffffa898,
m_formatImpl = 0x402244 <tinyformat::detail::FormatArg::formatImpl<int>(std::ostream&, char const*, char const*, int, void const*)>,
m_toIntImpl = 0x402298 <tinyformat::detail::FormatArg::toIntImpl<int>(void const*)>}}}
Once I step out of the tinyformat::detail::FormatListN constructor, however, something seems off. Inside tinyformat::vformat I have
(gdb) p *list.m_args
$18 = {m_value = 0x7fffffffa898, m_formatImpl = 0x7fffffffa6f8, m_toIntImpl = 0x404ae4}
and in the next step in tinyformat::detail::formatImpl I get
(gdb) p *args
$17 = {m_value = 0x404222 <tinyformat::vformat(std::ostream&, char const*, tinyformat::FormatList const&)+62>, m_formatImpl = 0x7fffffffa6f8, m_toIntImpl = 0x404ae4}
I don't know why, but something seems to be going wrong when assigning to FormatListRef. When I debug using -O1
optimizations, everything works as expected and the value of args in tinyformat::detail::formatImpl is correct:
(gdb) print *args
$2 = {m_value = 0x7fffffffa8a8, m_formatImpl = 0x403010 <tinyformat::detail::FormatArg::formatImpl<int>(std::ostream&, char const*, char const*, int, void const*)>,
m_toIntImpl = 0x4035e4 <tinyformat::detail::FormatArg::toIntImpl<int>(void const*)>}
Wow, great sleuthing. That's exactly the information we need (I can't track this down as I don't have an icc compiler license, though I applied for one.)
To me this is looking rather like a bizarre compiler bug. The other option is that I'm committing some obscure form of UB somewhere by creating my own quasi-vtable-like thing with the m_formatImpl
pointer.
At this point I think the best option is to progressively minimize tinyformat.h by deleting functionality until you've got a minimal reproduction. Then we might discover exactly what triggers this.
Yes, the address looks more like a data address. Here's what I get from gdb:
... (gdb) p *this $4 = {m_value = 0x404222 <tinyformat::vformat(std::ostream&, char const*, tinyformat::FormatList const&)+62>, m_formatImpl = 0x7fffffffa6f8, m_toIntImpl = 0x404ae4} (gdb) p tinyformat::detail::FormatArg::formatImpl<int> $3 = {void (std::ostream &, const char *, const char *, int, const void *)} 0x402244 <tinyformat::detail::FormatArg::formatImpl<int>(std::ostream&, char const*, char const*, int, void const*)>
It looks like the pointer m_value
contains the pointer to m_formatImpl
here??
Another option for debugging here might be to just printf
the values of these pointers from inside Format::format()
. (Are we sure that gdb is not lying to us here :-) ?)
I created a small example (I don't know if it's actually minimal but it should be close) to recreate the problem:
class FormatArg {
public:
FormatArg(const int& value)
: m_value(static_cast<const void*>(&value)), m_formatImpl(&formatImpl) {}
void format() const {
m_formatImpl(m_value);
}
private:
static void formatImpl(const void* value) {
const int true_val = *static_cast<const int*>(value);
}
const void* m_value;
void (*m_formatImpl)(const void* value);
};
class FormatList {
public:
FormatList(FormatArg* args) : m_args(args) {}
friend int main(int argc, char** argv);
private:
const FormatArg* m_args;
int m_N;
};
class FormatList1 : public FormatList {
public:
FormatList1(const int& arg) : FormatList(&m_formatterStore[0]),
m_formatterStore{ FormatArg(arg) } {}
private:
FormatArg m_formatterStore[1];
};
FormatList1 makeFormatList(const int& arg) {
return FormatList1(arg);
}
int main(int argc, char** argv) {
// This gives a segfault when using icpc
makeFormatList(argc).m_args[0].format();
// This would be fine...
// FormatList1(argc).m_args[0].format();
return 0;
}
The segfault is introduced in the makeFormatList
function. I assume that return value optimization "fixes" the problem for icpc, that's why there's no segfault for -O1
and up.
I can't find any UB in this example, but that doesn't mean anything. However, a further indication of a compiler problem is that when you omit FormatList::m_N from the class definition, there's no segfault.
I can't see any problem with this code either. I think this is minimal enough that you could report it upstream to icc and see what happens.
when you omit
FormatList::m_N
from the class definition, there's no segfault
If you can find a workaround you can put behind a small #ifdef __INTEL_COMPILER
(maybe injecting some dummy arguments into the class or some such) I'd be happy to merge that. If not, I'm not sure what else we can do here.
I posted the issue on the Intel Developer Forum and someone spotted the problem.
The function makeFormatList
creates a temporary object on the stack. The returned copy still holds a pointer to the memory allocated for the temporary object, which is invalid when the temporary goes out of scope.
To fix the issue, detail::FormatListN
simply needs a copy constructor. I created PR #59.
Great! A big thanks to whoever pointed that out, and thanks for communicating it back here.
I tried clang's -fsanitize=undefined
, but unfortunately it can't see the error because the constructor is elided even with -O0
. It's necessary to use the -fno-elide-constructors
option which then reproduces this issue with both g++
and clang++
. -fsanitize=undefined
reports the problem as a stack overflow though which it isn't. According to the g++ docs, C++17 compilers are now actually required to elide the copy constructor so icc may also change the default in the future.
Actually there's another way we could have debugged this which I should have thought of - it's possible to get valgrind to report the read of uninitialized memory. (Though for g++ only if -fno-elide-constructors
is passed explicitly, which require knowing what the error was.)
g++ -Wall -O0 -g -fno-elide-constructors -fno-omit-frame-pointer tinyformat_test.cpp -o a.out
valgrind --tool=memcheck --track-origins=yes ./a.out
Thanks for merging the fix. At least now I've learned that gcc & clang elide constructors even if setting -O0
! Probably something I'll check for more carefully in the future.
Yes, that was surprising to me. It seems with C++17 that the language has moved to this being considered a feature of the language rather than an optimization.
I came across a very strange problem in tinyformat v2.2.0. Even the most trivial of examples
causes a segfault when compiling with
icpc -std=gnu++11 -O0
. There is no segfault when compiling with any optimization level greater than 0. The backtrace isI tried with two different versions of icpc