Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

__attribute((noinline)) not respected #45433

Open Quuxplusone opened 4 years ago

Quuxplusone commented 4 years ago
Bugzilla Link PR46463
Status CONFIRMED
Importance P enhancement
Reported by Jeff Muizelaar (jmuizelaar@mozilla.com)
Reported on 2020-06-25 19:04:54 -0700
Last modified on 2021-11-09 02:55:00 -0800
Version trunk
Hardware PC All
CC dimitry@andric.com, glider@google.com, hans@chromium.org, i@maskray.me, jdoerfert@anl.gov, lebedev.ri@gmail.com, llvm-bugs@lists.llvm.org, ndesaulniers@google.com, nikita.ppv@gmail.com, sfertile@ca.ibm.com, smithp352@googlemail.com, srhines@google.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also PR26545
__attribute((noinline))
char *demo(char *s) {
    return s;
}

char *foo()  {
    return demo("input string");
}

compiles to:

demo(char*):                              # @demo(char*)
        movq    %rdi, %rax
        retq
foo():                                # @foo()
        movl    $.L.str, %eax
        retq
.L.str:
        .asciz  "input string"
Quuxplusone commented 4 years ago

noinline doesn't mean "block all optimizations", it only says "don't move code out of this function"

Quuxplusone commented 4 years ago
Which version of clang is this? With the latest 10.0.1 off the branch:

$ clang -fno-asynchronous-unwind-tables -fomit-frame-pointer -O2 -S pr46463.c -
o -
        .text
        .file   "pr46463.c"
        .globl  demo                    # -- Begin function demo
        .p2align        4, 0x90
        .type   demo,@function
demo:                                   # @demo
# %bb.0:                                # %entry
        movq    %rdi, %rax
        retq
.Lfunc_end0:
        .size   demo, .Lfunc_end0-demo
                                        # -- End function
        .globl  foo                     # -- Begin function foo
        .p2align        4, 0x90
        .type   foo,@function
foo:                                    # @foo
# %bb.0:                                # %entry
        movl    $.L.str, %edi
        jmp     demo                    # TAILCALL
.Lfunc_end1:
        .size   foo, .Lfunc_end1-foo
                                        # -- End function
        .type   .L.str,@object          # @.str
        .section        .rodata.str1.1,"aMS",@progbits,1
.L.str:
        .asciz  "input string"
        .size   .L.str, 13

        .ident  "FreeBSD clang version 10.0.1 (git@github.com:llvm/llvm-project.git llvmorg-10.0.0-97-g6f71678ecd2)"
        .section        ".note.GNU-stack","",@progbits
        .addrsig

So this tail-calls the demo function.
Quuxplusone commented 4 years ago
clang trunk: https://gcc.godbolt.org/z/tpJsOm
It seems like this was a recent behaviour change as clang 10.0 doesn't show the
problem.
Quuxplusone commented 4 years ago
(In reply to Jeff Muizelaar from comment #3)
> It seems like this was a recent behaviour change as clang 10.0 doesn't show
> the problem.

It appears to have regressed with https://reviews.llvm.org/rG8903e61b6611
("[AsmPrinter][ELF] Define local aliases (.Lfoo$local) for GlobalObjects"), by
Fangrui Song.

I guess it is an unintended side effect? Whatever the cause, it would be nice
to have it fixed before 11 branches.
Quuxplusone commented 4 years ago
(In reply to Dimitry Andric from comment #4)
> It appears to have regressed with https://reviews.llvm.org/rG8903e61b6611
> ("[AsmPrinter][ELF] Define local aliases (.Lfoo$local) for GlobalObjects"),
> by Fangrui Song.

Eh sorry, this was a bisection mistake on my part. This commit introduces a
.Lfoo$local label, but does not yet eliminate the tail call. Looking further.
Quuxplusone commented 4 years ago
Looked again, and it is definitely regressed with
https://reviews.llvm.org/rG03727687766a ("[InstCombine] Simplify calls with
"returned" attribute") by Nikita Popov.

With clang llvmorg-11-init-06295-g9cf920e64d1:
========================================================================
    .text
    .file   "pr46463.c"
    .globl  demo                    # -- Begin function demo
    .p2align    4, 0x90
    .type   demo,@function
demo:                                   # @demo
.Ldemo$local:
# %bb.0:                                # %entry
    movq    %rdi, %rax
    retq
.Lfunc_end0:
    .size   demo, .Lfunc_end0-demo
                                        # -- End function
    .globl  foo                     # -- Begin function foo
    .p2align    4, 0x90
    .type   foo,@function
foo:                                    # @foo
.Lfoo$local:
# %bb.0:                                # %entry
    movl    $.L.str, %edi
    jmp .Ldemo$local            # TAILCALL
.Lfunc_end1:
    .size   foo, .Lfunc_end1-foo
                                        # -- End function
    .type   .L.str,@object          # @.str
    .section    .rodata.str1.1,"aMS",@progbits,1
.L.str:
    .asciz  "input string"
    .size   .L.str, 13

    .ident  "clang version 11.0.0 (https://github.com/llvm/llvm-project.git
9cf920e64d18e9c64706c8c8baf71a4919dcbb42)"
    .section    ".note.GNU-stack","",@progbits
    .addrsig
========================================================================

With clang llvmorg-11-init-06296-g03727687766:
========================================================================
    .text
    .file   "pr46463.c"
    .globl  demo                    # -- Begin function demo
    .p2align    4, 0x90
    .type   demo,@function
demo:                                   # @demo
.Ldemo$local:
# %bb.0:                                # %entry
    movq    %rdi, %rax
    retq
.Lfunc_end0:
    .size   demo, .Lfunc_end0-demo
                                        # -- End function
    .globl  foo                     # -- Begin function foo
    .p2align    4, 0x90
    .type   foo,@function
foo:                                    # @foo
.Lfoo$local:
# %bb.0:                                # %entry
    movl    $.L.str, %eax
    retq
.Lfunc_end1:
    .size   foo, .Lfunc_end1-foo
                                        # -- End function
    .type   .L.str,@object          # @.str
    .section    .rodata.str1.1,"aMS",@progbits,1
.L.str:
    .asciz  "input string"
    .size   .L.str, 13

    .ident  "clang version 11.0.0 (https://github.com/llvm/llvm-project.git
03727687766a72504712861bf038f0be962527d0)"
    .section    ".note.GNU-stack","",@progbits
    .addrsig
========================================================================
Quuxplusone commented 4 years ago

Sounds like issues related to noinline were raised there: https://reviews.llvm.org/D75815#1941004

Quuxplusone commented 4 years ago

I'd argue there is n obug. This function is not inlined, noinline is respected.

(In reply to Roman Lebedev from comment #1)

noinline doesn't mean "block all optimizations", it only says "don't move code out of this function"

+1

See https://reviews.llvm.org/D75815#1976892


Some examples that might be interesting:

__attribute((noinline))
char *demo(char *s) {
    return s;
}

void foo()  {
    demo("input string");
}

would you argue there must be a call to demo in foo? I would say no.

__attribute((noinline))
char *demo(char *s) {
    return NULL;
}

void foo()  {
    return demo("input string");
}

I would again not expect a call.

(Without testing it, I assume both of the above should have been optimized before the returned argument patch.)

Quuxplusone commented 4 years ago
From an end-user perspective, appearing to inline the function while it is
marked as 'noinline' would be rather suprising. Is there any usefulness to the
noinline attribute in that case at all?

That said, looking at gcc's documentation here:
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-noinline-function-attribute

they are saying:
> This function attribute prevents a function from being considered for
> inlining. If the function does not have side effects, there are
> optimizations other than inlining that cause function calls to be
> optimized away, although the function call is live.

Unfortunately they don't really say what those other optimizations are, but
they also specify a workaround to get the desired effect, which is (at least
from my point of view) to force the function to always be called, and never
inlined:

> To keep such calls from being optimized away, put
>
> asm ("");
>
> (see Extended Asm) in the called function, to serve as a special side effect.
Quuxplusone commented 4 years ago
(In reply to Dimitry Andric from comment #9)
> they also specify a workaround to get the desired effect, which is (at least
> from my point of view) to force the function to always be called, and never
> inlined

What I meant by this was that sometimes end-users can want this, for various
reasons. And instead of using the rather horrid method of inserting barriers,
an official attribute specifying this (reallydontinline?) would be nicer. Then
again, if you need to build software that compiles with both clang and gcc, you
will likely be forced to use the barriers anyway.
Quuxplusone commented 4 years ago
This kind of thing is going to get Clang banned from use in the Linux kernel:
https://godbolt.org/z/M2g4ZJ

We ***need*** the semantics of noinline to mean "explicit call" instruction.
Messing with that is going to cause all kinds of unexpected bugs.

For example, noinline is used extensively in the kernel to limit stack usage,
as the kernel only has 2 pages worth of stack to work with.

The linux kernel has 1696 uses of noinline today.  I'm sure a vast majority of
them have very good reasons to do so.
Quuxplusone commented 4 years ago

Hm reading jdoerfert's comments on D75815 and the surrounding text, it seems that indeed the 'noipa' attribute is more likely covering this use case. But TBH I had not heard of this attribute before, maybe it is not that well-known?

At least gcc also supports it, but in their documentation they mention "This attribute is supported mainly for the purpose of testing the compiler". So no idea how production-ready it is.

Quuxplusone commented 4 years ago

We need the semantics of noinline to mean "explicit call" instruction. > Messing with that is going to cause all kinds of unexpected bugs.

The thing is, that was never the case: https://godbolt.org/z/rix62q If you want that, we need a different attribute (in IR).

I understand people want to be compatible with GCC or have certain expectations. However, we should not solve this by inconsistent descriptions/interpretations of the IR. As shown above, we always removed calls to noinline functions even if the result was used. Not to mention if the result is unused and the function is const. That we now propagate arguments through the call is the natural evolution from this year long precedence.

In https://reviews.llvm.org/D75815#1976892 we talk about noipa for the IR, similarly, you could do optnone on the call instruction. Either way, if you want to prevent optimization across call edges, we need to provide a different/new attribute to represent that (or use optnone for the function).

From an end-user perspective, appearing to inline the function while it is marked as 'noinline' would be rather suprising. Is there any usefulness to the noinline attribute in that case at all?

noinline will, as described by GCC, prevent inlining. If we now say it also does "imply" X and Y, people that want only to prevent inlining might complain. It is also not defined that way so we play whack-a-mole with optimizations and analysis that interpret the meaning based on the definition we use (https://llvm.org/docs/LangRef.html#function-attributes). That said, if we need an attribute to describe something similar to noinline we should create one and use it.

Then again, if you need to build software that compiles with both clang and gcc, you will likely be forced to use the barriers anyway.

The (particular) asm barrier might or might not prevent us from doing the IPO. For now it would probably work.

Quuxplusone commented 4 years ago
(In reply to Dimitry Andric from comment #12)
> Hm reading jdoerfert's comments on D75815 and the surrounding text, it seems
> that indeed the 'noipa' attribute is more likely covering this use case. But
> TBH I had not heard of this attribute before, maybe it is not that
> well-known?
>
> At least gcc also supports it, but in their documentation they mention "This
> attribute is supported mainly for the purpose of testing the compiler". So
> no idea how production-ready it is.

As far as I know, LLVM doesn't have it (yet). We could/should introduce it
though.
Quuxplusone commented 4 years ago
(In reply to Nick Desaulniers from comment #11)
> This kind of thing is going to get Clang banned from use in the Linux
> kernel

Sorry, that was hyperbolic.  I had lunch and a cup of coffee and now I less
hangry and fully caffeinated.

That said, none of our alarm bells have gone off about anything being broken
from this change.  I would just hate for something to be found broken due to
this further down the road.

Similarly that there are cases where __attribute__((always_inline)) doesn't
mean "always inline," I guess we'll have cases to point people to where
__attribute__((noinline)) doesn't mean "no inline."

I much prefer people having function attributes that provide some actual
semantics rather than empty asm statements though.
Quuxplusone commented 4 years ago

(In reply to Nick Desaulniers from comment #15)

(In reply to Nick Desaulniers from comment #11)

This kind of thing is going to get Clang banned from use in the Linux kernel

Sorry, that was hyperbolic. I had lunch and a cup of coffee and now I less hangry and fully caffeinated.

No worries. I also do not want this to happen. The idea would have been that clang adds "something extra" if it compiles noinline in "kernel mode". That is still an option FWIW.

That said, none of our alarm bells have gone off about anything being broken from this change. I would just hate for something to be found broken due to this further down the road.

Short of having a second attribute in the kernel source I strongly advocate for a "kernel mode/flag" that adds the second attribute in IR.

Similarly that there are cases where attribute((always_inline)) doesn't mean "always inline," I guess we'll have cases to point people to where attribute((noinline)) doesn't mean "no inline."

The first is true, the second is not. noinline did and does mean "no inline". I remember we discussed a new attribute to address the former, as they were again two camps, those who wanted it to mean always and those who wanted the current behavior. noinline is the same. It cannot mean two things. It always did mean "no inline" but if people also want "no inter-procedural stuff" we need a new IR attribute and probably a source one too.

I much prefer people having function attributes that provide some actual semantics rather than empty asm statements though.

+2 yes, and that is why I emphasize we add new function attributes (on IR level) if there is a need for them. noipa seems like a uncontroversial candidate to be added.

Quuxplusone commented 4 years ago

Nick flagged this as a potential release blocker a while ago. Are there still concerns here?

Quuxplusone commented 4 years ago
(In reply to Hans Wennborg from comment #17)
> Nick flagged this as a potential release blocker a while ago. Are there
> still concerns here?

Fixing this would require adding new noipa attribute.
I'm not sure this can be viewed as a blocker.
Quuxplusone commented 4 years ago
(In reply to Roman Lebedev from comment #18)
> (In reply to Hans Wennborg from comment #17)
> > Nick flagged this as a potential release blocker a while ago. Are there
> > still concerns here?
>
> Fixing this would require adding new noipa attribute.
> I'm not sure this can be viewed as a blocker.

Agreed. And the behavior did not conceptually change since 3.0
(https://godbolt.org/z/rix62q)
Quuxplusone commented 4 years ago

Okay, let's not block on it then.