Labels generated, but no code above 'O0'

Quuxplusone commented 7 years ago


Bugzilla Link	PR35508
Status	REOPENED
Importance	P enhancement
Reported by	Martin O'Riordan (martino@theheart.ie)
Reported on	2017-12-02 20:33:07 -0800
Last modified on	2017-12-18 02:11:51 -0800
Version	5.0
Hardware	PC All
CC	codeman.consulting@gmail.com, dimitry@andric.com, efriedma@quicinc.com, llvm-bugs@lists.llvm.org, matze@braunis.de, paul_robinson@playstation.sony.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

While reducing a bug in my LLVM Out-of-Tree sources (v5.0.0), I discovered that
the same problem occurs with the official v4.0.0 and v5.0.0 X86 distributions
for Windows and Linux.

When the following example is compiled with 'clang -S -O0' code is correctly
generated, but with '-O1' and above, only labels are generated with no code
following.  No error is reported and the exit code is zero.

The minimal test case I could reduce to is as follows:

  void func(int *z) {
    int *b = 0;

    for (int j = 0; j < 16; j++) {
      int c = *b;
      c -= *z++;
      *b++ = c;
    }
  }

For X86 this produces only the following code (nothing after "# BB#0:"):

      .text
      .def     @feat.00;
      .scl    3;
      .type   0;
      .endef
      .globl  @feat.00
  @feat.00 = 1
      .def     _func;
      .scl    2;
      .type   32;
      .endef
      .globl  _func
      .p2align    4, 0x90
  _func:                                  # @func
  # BB#0:

Quuxplusone commented 7 years ago

Since you are assigning a null pointer to b, the operation "int c = *b" is
undefined, and basically everything after that is undefined.  So clang decides
to optimize the whole loop away.

If you really want to dereference the null pointer anyway (which will crash at
runtime), try using "volatile int *" instead.

Quuxplusone commented 7 years ago

You're absolutely right - I should've spotted that as being a UB issue.

Thanks, MartinO

Quuxplusone commented 7 years ago

In my version of Clang I get the following:

Z:\Code>clang --version
clang version 5.0.0 (https://github.com/llvm-mirror/clang.git
adf03c776be767d5d56fdb20c56ec4f6df0b33e6) (https://github.com/llvm-
mirror/llvm.git 7bfd7c00d76359356c3572222f33b03931972c9f)

Z:\Code>clang -S -O1 nocode.cpp -o nocode2.s
Z:\Code>type nocode2.s
        .text
        .def     "?func@@YAXPEAH@Z";
        .scl    2;
        .type   32;
        .endef
        .globl  "?func@@YAXPEAH@Z"
        .p2align        4, 0x90
"?func@@YAXPEAH@Z":                     # @"\01?func@@YAXPEAH@Z"
# BB#0:
        ud2

Which makes sense and seems correct at first glance with the undefined
instruction replacing the incorrect pointer code.   However, I get a completely
different result following a run through LLC at O1:

Z:\Code>clang -O0 nocode.cpp -emit-llvm -c

Z:\Code>llc -filetype=asm nocode.bc -O1 -o nocode1.s -x86-asm-syntax=intel

Z:\Code>type nocode1.s
        .text
        .intel_syntax noprefix
        .def     "?func@@YAXPEAH@Z";
        .scl    2;
        .type   32;
        .endef
        .globl  "?func@@YAXPEAH@Z"
        .p2align        4, 0x90
"?func@@YAXPEAH@Z":                     # @"\01?func@@YAXPEAH@Z"
.Lcfi0:
.seh_proc "?func@@YAXPEAH@Z"
# BB#0:
        sub     rsp, 24
.Lcfi1:
        .seh_stackalloc 24
.Lcfi2:
        .seh_endprologue
        mov     qword ptr [rsp + 16], rcx
        mov     qword ptr [rsp + 8], 0
        mov     dword ptr [rsp + 4], 0
        cmp     dword ptr [rsp + 4], 15
        jg      .LBB0_3
        .p2align        4, 0x90
.LBB0_2:                                # =>This Inner Loop Header: Depth=1
        mov     rax, qword ptr [rsp + 8]
        mov     eax, dword ptr [rax]
        mov     dword ptr [rsp], eax
        mov     rax, qword ptr [rsp + 16]
        lea     rcx, [rax + 4]
        mov     qword ptr [rsp + 16], rcx
        mov     ecx, dword ptr [rsp]
        sub     ecx, dword ptr [rax]
        mov     dword ptr [rsp], ecx
        mov     rax, qword ptr [rsp + 8]
        lea     rdx, [rax + 4]
        mov     qword ptr [rsp + 8], rdx
        mov     dword ptr [rax], ecx
        inc     dword ptr [rsp + 4]
        cmp     dword ptr [rsp + 4], 15
        jle     .LBB0_2
.LBB0_3:
        add     rsp, 24
        ret
        .seh_handlerdata
        .text
.Lcfi3:
        .seh_endproc

Same at O2 or O3.
In this case *b is generated as a local stack variable at [rsp+8], followed by
j at [rsp+4] and finally c at [rsp].   The value of *z is unknown, but c is an
obvious null ptr deref.   For some reason the LLC build doesn't recognize this
as a null ptr derefence and generates the rest of the loop.

Now my question is, shouldn't the assembly output of llc -O1 on bitcode be the
same as clang -O1 on the .cpp file?  On windows they don't at least.  That
seems like a actual bug to me, in theory we should get nearly identical code
from both methods of doing this.  Even at -O3 LLC doesn't reduce to UDF.

My opinion on this particular report is mixed;  by reducing to UDF we lose the
ability for a null ptr dereference exception to be thrown, and hence debugging
info for the problem even though it's obvious.  I think that's at least worth
looking at;   I remember a discussion about it before but if the code is
invalid, why not just error out compiling when the null ptr deref is hit and
diagnose it there rather than create an executable where the problem can no
longer be easily solved?

In addition, clang -O1 -m32 produces the following instead:

Z:\Code>type nocode5.s
        .text
        .def     @feat.00;
        .scl    3;
        .type   0;
        .endef
        .globl  @feat.00
@feat.00 = 1
        .def     "?func@@YAXPAH@Z";
        .scl    2;
        .type   32;
        .endef
        .globl  "?func@@YAXPAH@Z"
        .p2align        4, 0x90
"?func@@YAXPAH@Z":                      # @"\01?func@@YAXPAH@Z"
# BB#0:
        nop                             # avoids zero-length function

        .def     _main;
        .scl    2;
        .type   32;
        .endef
        .globl  _main
        .p2align        4, 0x90
...

Now that situation is just completely incorrect, as the NOP is executed and the
debugger breaks in main with an invalid dereference of argv** because the
function it was calling just falls back through to it.   Even the UDF is better
than that and I think this is an actual bug if it's still happening in ToT.  My
windows build isn't up to date at the moment but I'm reopening because of that
last example.

Quuxplusone commented 7 years ago

BTW just for comparison MSVC's clang-cl and regular cl are both able to debug
this correctly and do not reduce the loop.   In debug mode the null pointer is
caught when b is dereferenced, in release without a PDB the null pointer
exception is thrown on, presumably because of lack of proper debug info, but it
still breaks in that loop.
      c -= *z++;

Either way it's clear from the debug where the general problem is;  the clang
output at -m32 gives only a hint since func is just a nop.

Quuxplusone commented 7 years ago

I know all of that is nitpicky, but at least the UDF makes it obvious that there's a problem and it's in func(). The NOP is just plain incorrect output IMO.

Quuxplusone commented 7 years ago

Clang's -O1 and llc's -O1 are not the same.  llc uses an optimization
pipeline defined by LLVM, while Clang constructs its own pipeline.
So, seeing different output is not unexpected.  It does seem odd that
llc did not optimize away the loop and therefore the function, that
might be worth investigating.

The emitted 'ud2' instruction is something that happens when you have
a Windows-target compilation, it replaces 'unreachable' with the trap
instead of optimizing it away completely.

Why optimize away the function without emitting an error?  The function
has undefined behavior... but only if it is actually called!  Clang
cannot emit an error here, because the UB might not occur, and Clang
does not have enough knowledge to say definitively one way or the other.
And LLVM cannot be sure where the 'unreachable' was introduced, as it
might have come from a previous pass that did some analysis to show
that a particular path happens to be unreachable in a particular context.
This might be due to optimization transforms and not reflect true C/C++
undefined behavior in the source program.

I agree it would be lovely if LLVM had better diagnostics in this area.
But even if it did, they would never be perfect, and so static analysis
and the sanitizers are your best friends here.

Quuxplusone commented 7 years ago

(In reply to Paul Robinson from comment #6)
> Why optimize away the function without emitting an error?  The function
> has undefined behavior... but only if it is actually called!  Clang
> cannot emit an error here, because the UB might not occur, and Clang
> does not have enough knowledge to say definitively one way or the other.
> And LLVM cannot be sure where the 'unreachable' was introduced, as it
> might have come from a previous pass that did some analysis to show
> that a particular path happens to be unreachable in a particular context.
> This might be due to optimization transforms and not reflect true C/C++
> undefined behavior in the source program.
>
> I agree it would be lovely if LLVM had better diagnostics in this area.
> But even if it did, they would never be perfect, and so static analysis
> and the sanitizers are your best friends here.

So, this is definitely not a CLang problem, and 'scan-build' does catch it.
Programmers who routinely use the static analysers would have detected and
corrected the problem long before I say it reported as a bug.

I fully accept that this is UB and that what actually happens is decidedly
undefined :-)

However, the UB I got was very, very strange.  The code that was being compiled
was also using '-ffunction-sections', so each function was in it's own section.
When the function 'func' was called, the function that was alphabetically next
is what was actually called, and it looked extremely like the type of bugs you
get when a C++ virtual function is called with a corrupt or out-of-date Vtable,
or other indirect call related problem.

In this particular example, the MBB (in LLVM) after optimisation has an empty
sequence of BBs, so the function entry-point is emitted, but there is nothing
to follow.

What I am looking into (in our target) is to detect an empty BB list, and emit
a simple function return (or suppress the label emission), give a diagnostic
(words TBD), and exit with a non-zero exit code.  This will allow the
programmer to realise that something is not right.

But the silent emission of an orphaned label while technically a valid
interpretation of UB (and I'm as guilty as any for taking advantage of UB), the
fact remains that it is mind-bogglingly complicated for the programmer to
realise what has caused it.  It is also important that the programmer should be
able to diagnose UB artefacts without excessive investigation.

The simple test case I provided was distilled from nearly 10,000 lines of code,
so even for the compiler writer it was a complicated problem to identify.  I
ended up reducing it, but the UB still caught me out.

In LLVM, I can think of no legal circumstance where the list of BBs for a
function would ever be empty, so detecting this and emitting an error message
of some kind seems like a reasonable thing to do.

So now it move from UB (Undefined Behaviour) to QoI (Quality of Implementation).

    MartinO

Quuxplusone commented 7 years ago

Also note that llc does not add any bitcode optimization passes, that's a job for the opt tool.

Quuxplusone commented 7 years ago

Emitting a ud2 or equivalent would make sense; falling off the end of a function is very confusing to debug.

We really don't want to emit warnings based on optimizations. They can be useful; the LLVM optimizer is better at finding undefined behavior than than clang static analyzer. But the LLVM optimizer is not a static analysis tool; emitting warnings leads to issues when we break someone's -Werror build, or emit a false positive, or emit a warning which technically correct but completely incomprehensible to anyone who isn't a compiler developer.

Quuxplusone commented 6 years ago

Thanks for all the comments, I learn something new every day. I had always thought that llc ran the same passes as opt or clang. I'll keep this in mind in the future while trying to repro things like this. I still don't believe LLC should be producing this incorrect code though.

I do agree on Eli Friedman's point that Clang isn't an OPT diagnostic tool. Visual Studio and other compilers don't complain about this either, that's a job for debugging and some things picked up by static analyzers compile into code that works fine.

So yes, I'd agree that the LLC production of a NOP'd function should be looked into but not much further. LLC isn't used often in actual compilations so it should be low priority.

Quuxplusone commented 6 years ago

What I eventually did for this, was to emit 'abort' when the function contains no instructions, so if called, instead of random code being executed, it immediately terminates and they can use normal debugging tools to figure out why.

I took your advice, and have not produced a compile-time diagnostic.

Thanks, MartinO

Quuxplusone / LLVMBugzillaTest

Labels generated, but no code above 'O0' #34481