Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

glibc _FORTIFY_SOURCE idiom (__asm__ alias and extern inline shadowing definition) is ignored by clang #23280

Open Quuxplusone opened 9 years ago

Quuxplusone commented 9 years ago
Bugzilla Link PR23280
Status NEW
Importance P normal
Reported by Daniel Micay (danielmicay@gmail.com)
Reported on 2015-04-18 12:44:57 -0700
Last modified on 2015-08-25 21:10:49 -0700
Version trunk
Hardware All All
CC ahmed@bougacha.org, chh@google.com, danielmicay@gmail.com, eugeni.stepanov@gmail.com, george.burgess.iv@gmail.com, jsweval@arxan.com, llvm-bugs@lists.llvm.org, rafael@espindo.la, rnk@google.com
Fixed by commit(s)
Attachments
Blocks PR21420
Blocked by
See also PR23301

This will output 2 with gcc -O2 but (size_t)-1 with Clang:

include

include

static inline attribute((always_inline)) size_t object_size(char *p) { return __builtin_object_size(p, 0); }

int main(int argc, char **argv) { char buf[2]; printf("%zu\n", object_size(buf)); return 0; }

This means that _FORTIFY_SOURCE buffer size checks do not currently work with Clang because it lowers the __builtin_object_size calls to (size_t)-1 before the wrappers are inlined into the call sites. This will also break other compelling use cases for this feature. It makes a lot of sense to lower it to a known value pre-inlining to get more work done in the early function passes, but replacing it with -1 before then isn't good.

Quuxplusone commented 9 years ago

The usage of __builtin_object_size in the Linux kernel's copy_from_user/copy_to_user functions is another security feature that's broken by this.

Quuxplusone commented 9 years ago
In your example, __builtin_object_size is lowered to -1 early because buf isn't
used, and gets optimized out, in which case __builtin_object_size becomes
meaningless.

In which case the problem is really that __builtin_object_size is lowered too
*late*, not too early:  if it was lowered before buf was optimized out, you'd
get the expected '2'.  Earlier lowering to non-unknown is fine, so we can
indeed improve this.

However, consider:

#include <stdio.h>
#include <stdlib.h>

static inline __attribute__((always_inline)) size_t object_size(char *p) {
  return __builtin_object_size(p, 0);
}

void foo(char *p);

int main(int argc, char **argv) {
  char buf[2];
  foo(buf);
  printf("%zu\n", object_size(buf));
  return 0;
}

This correctly lowers object_size into '2'.  Inlining shouldn't interfere, and
happens way before __builtin_object_size is lowered.

In fact, the lowering to -1/unknown only ever happens either on optimized out
objects (the example), or way later after all the mid-level optimizations
(including inlining).

Am I missing something?
Quuxplusone commented 9 years ago

I looked into why we don't lower earlier in this case: when InstCombine tries to erase an alloca, the isAllocSiteRemovable logic is smarter than the RAUW-undef logic: when there's a (objectsize (GEP/bitcast (alloca))), we should look through the alloca users to properly lower the objectsize calls. We currently just do something like alloca->RAUW(undef).

Quuxplusone commented 9 years ago
I pointed at the wrong culprit but there is a real problem here. The buffer
overflow protection in the Linux kernel, glibc and bionic doesn't work with any
case I've tried.

It even lowers it to -1 here:

  #include <unistd.h>
  #include <stdlib.h>
  #include <fcntl.h>

  int main(int argc, char **argv) {
    char buf[2];
    int x = atoi("20");
    int fd = open("/dev/zero", O_RDONLY);
    read(fd, buf, x);
    return 0;
  }

I just get crashes with Clang where GCC hits the error reporting path.
Quuxplusone commented 9 years ago

Ah, so there's actually a separate frontend issue which I confused with the optimization issues I saw relative to GCC. The inline function definitions in glibc are just being ignored because Clang doesn't like the 'extern' in their definitions.

Quuxplusone commented 9 years ago
Ah, that sounds like the real problem, yes.

There's a few PRs already saying - if I read them correctly - that this won't
be fixed, and linking to:
  http://clang.llvm.org/compatibility.html#inline

That page proposes using -std=gnu89 when all else is impossible, have you tried
that?
Quuxplusone commented 9 years ago

Can you elaborate on the 'extern inline' issues? I'm not familiar with any current immediate conflicts with glibc's headers.

Quuxplusone commented 9 years ago

It doesn't pick up the fortified function implementations unless the extern is removed from their fortify function macro. It explicitly uses attribute((gnu_inline)) so the chosen -std switch isn't the issue. Anyway, it's a separate bug from the optimization issue(s) which also impact Bionic, etc.

Quuxplusone commented 9 years ago
The __asm__ bit is also important.  I reduced it to:

extern int foo(int a);
extern int foo_alias(int a) __asm__ ("foo");
extern inline __attribute__((gnu_inline)) int foo(int a) {
  return foo_alias(a) + 123;
}

int bar(int a) {
  return foo(a);
}

This ignores the +123 bit.  Removing anything generates the expected code
instead.
Quuxplusone commented 9 years ago
The gcc I tried does something very strange with that code.  At -O0 only, it
also ignores +123, I suspect because it chose not to inline, and just went with
the extern declaration?

Adding __attribute__((always_inline)) to the foo definition (like in the
original code) forces it to always generate the +123:

$ cat test.c
extern int foo(int a);
extern int foo_alias(int a) __asm__ ("foo");
extern inline __attribute__((always_inline)) __attribute__((gnu_inline)) int
foo(int a) {
  return foo_alias(a) + 123;
}

int bar(int a) {
  return foo(a);
}

$ gcc test.c -o - -S -O0 | grep 123 -B1
    call    foo
    addl    $123, %eax

With clang it doesn't help though (no matter the -O level, since the frontend
doesn't even generate the foo body), and that's the difference:

$ clang test.c -o - -S -mllvm -disable-llvm-optzns

define i32 @bar(i32 %a) #0 {
  %1 = alloca i32, align 4
  store i32 %a, i32* %1, align 4
  %2 = load i32* %1, align 4
  %3 = call i32 @foo(i32 %2)
  ret i32 %3
}

declare i32 @foo(i32) #1

$ clang test.c -o - -S -O1

define i32 @bar(i32 %a) #0 {
  %1 = tail call i32 @foo(i32 %a) #2
  ret i32 %1
}

declare i32 @foo(i32) #1
Quuxplusone commented 9 years ago

Maybe your example is too reduced? If I remove the extern from your example, LLVM will emit a definition of foo that calls itself, which is probably not what you wanted.

Quuxplusone commented 9 years ago
(In reply to comment #10)
> The gcc I tried does something very strange with that code.  At -O0 only, it
> also ignores +123, I suspect because it chose not to inline, and just went
> with the extern declaration?

Yes, if the inline version isn't used it will use the external one instead. You
can do that with ISO C inline semantics too, but the syntax is different. It's
usually the desired semantics because it avoids any duplication when inlining
fails. In the case of _FORTIFY_SOURCE the code in the inline function doesn't
accomplish anything if inlining can't happen so using always_inline to force it
whenever possible and falling back to the unfortified version for function
pointers makes sense.
Quuxplusone commented 9 years ago

This asm(...) issue might be why Bionic had to disable all of the fast paths calling through to the unchecked versions. I don't know the history on that workaround though.

Quuxplusone commented 9 years ago
No, I think that's correct.  I'm not familiar with those parts at all, but it
seems to me the real problem is how we express always_inline functions.  If we
handled that in the frontend, we could generate something like:

declare i32 @foo(i32 %a)

define i32 @bar(i32 %a) #1 {
  %1 = alloca i32, align 4
  store i32 %a, i32* %1, align 4
  %2 = load i32* %1, align 4
  ; the inlined definition of foo, calling the external declaration of foo
  %3 = call i32 @foo(i32 %2)
  %4 = add nsw i32 %3, 123
  ret i32 %4
}

Which is what gcc does.

My understanding is, we keep always_inline as attribute on the function and
only do the inlining on IR, much later, which forces us to keep a definition
around (in this case, it would call itself).

Perhaps clang should generate something like:

declare i32 @foo(i32 %a)

define i32 @foo.alwaysinline(i32 %a) #0 {
  %1 = alloca i32, align 4
  store i32 %a, i32* %1, align 4
  %2 = load i32* %1, align 4
  %3 = call i32 @foo(i32 %2)
  %4 = add nsw i32 %3, 123
  ret i32 %4
}

define i32 @bar(i32 %a) #1 {
  %1 = alloca i32, align 4
  store i32 %a, i32* %1, align 4
  %2 = load i32* %1, align 4
  %3 = call i32 @foo.alwaysinline(i32 %2)
  ret i32 %3
}
Quuxplusone commented 9 years ago
(In reply to comment #14)
> No, I think that's correct.  I'm not familiar with those parts at all, but
> it seems to me the real problem is how we express always_inline functions.
> If we handled that in the frontend, we could generate something like:

Bugzilla ate that part, but this was in reply to:

(In reply to comment #11)
> Maybe your example is too reduced? If I remove the extern from your example,
> LLVM will emit a definition of foo that calls itself, which is probably not
> what you wanted.
Quuxplusone commented 9 years ago
(In reply to comment #14)
> Perhaps clang should generate something like:
>
> declare i32 @foo(i32 %a)
>
> define i32 @foo.alwaysinline(i32 %a) #0 {
>   %1 = alloca i32, align 4
>   store i32 %a, i32* %1, align 4
>   %2 = load i32* %1, align 4
>   %3 = call i32 @foo(i32 %2)
>   %4 = add nsw i32 %3, 123
>   ret i32 %4
> }
>
> define i32 @bar(i32 %a) #1 {
>   %1 = alloca i32, align 4
>   store i32 %a, i32* %1, align 4
>   %2 = load i32* %1, align 4
>   %3 = call i32 @foo.alwaysinline(i32 %2)
>   ret i32 %3
> }

I think Clang is already generating the right thing here for the given input.
The user aliased foo_alias to foo, and then defined foo as calling foo_alias,
so that's clearly infinite recursion. Fixing this would probably require
something like:

extern int foo_external(int);
static inline __attribute__((always_inline)) size_t foo(int a) {
  return foo_external(a) + 123;
}

At least, that's how __memset_chk & co work.
Quuxplusone commented 9 years ago
(In reply to comment #16)
> I think Clang is already generating the right thing here for the given
> input. The user aliased foo_alias to foo, and then defined foo as calling
> foo_alias, so that's clearly infinite recursion.

The point is, gcc disagrees on the right thing, and people rely on the specific
idiom of __asm__ alias + extern inline for this kind of "shadowing definition".

If you change anything in the code (I think), both gcc and clang give you the
infinite recursion.

I have no stake and can't say anything about the frontend, really, but would it
make sense to match gcc's behavior?

As Daniel says, that would help  support _FORTIFY_SOURCE (and maybe some other
code using this idiom) in glibc and other libraries.

Also, moving to the clang component.
Quuxplusone commented 9 years ago

Doing some poking around, the issue seems to be that extern causes the always_inline read() to be marked as AvailableExternally, and __read_alias causes clang to mark read() as trivially recursive. CodeGen will not emit the function body if the linkage is AvailableExternally and the function is trivially recursive as defined by CodeGenModule::isTriviallyRecursive.

This explains why removing either the alias (no more recursion) or the extern (no more AvailableExternally) "fixes" the problem in clang.

Without the recursion check, clang/LLVM trunk generates code to call __read_chk with size=2 on O1-O3[1]. If the buffer size is not statically known, we have infinite recursion, as Reid noted.

I'll see if I can find a good solution to both PR9614 (the bug that prompted the recursion check) + this. If not, calling foo_external may be our only option in situations like these.

[1] - With a bit of coercion. :) We may be able to get O0 to work with a bit of effort, but that's a separate issue.

Quuxplusone commented 9 years ago
Evgeniy was working was working on an always_inline fix that would address this
whole problem and eliminate the silliness around isTriviallyRecursive.

It goes kind of like this:
- For every always_inline function, emit two LLVM functions for it: one for
direct calls, and one for taking its address.
- The direct call version always uses some internal name that doesn't matter,
like 'foo.inline' for the function 'foo'.
- The "address taken" version uses the normal name that 'foo' would have,
accounting for any __asm__ labels.
- If the function is gnu_inline, the "address taken" version is a declaration.
- If the function is not gnu_inline, the "address taken" version is an internal
definition with a body containing a musttail call to the direct call version.

This should break the infinite recursion pattern, and allow the horrible glibc
"shadowing definition" idiom. If the user calls 'read' directly, they actually
end up calling an internal always_inline wrapper that gets inlined. If the user
takes the address of 'read', they skip that stuff and directly reference the
'read' symbol in the object file.
Quuxplusone commented 9 years ago

While that's definitely a solid solution, what are your thoughts on simply disabling TCO on available_externally alwaysinline functions in LLVM? Given that clang outright disallows this combination and no one's complained (outside of this bug), I'd be surprised if there were any functions that absolutely rely on TCO while having always_inline + extern attributes.

Alternatively, we could add a noTCO function attribute and teach clang to put that on gnu_inline always_inline [...] functions. LLVM already lets you disable TCO program-wide with a flag AFAICT, so this would also allow you to be more granular with that.

Quuxplusone commented 9 years ago

After thinking about it, I liked Reid's approach better, so I implemented that. It works; read() pretty-prints stack traces on O1-O3 for the given program. However, LLVM doesn't currently support the linkage that we need in order to make it work properly.

Specifically, I had to take @read.always_inline and swap its linkage to private, which means its body may be sent to a .o file. Additionally, this swap caused other available_externally calls (read: open) to be scrutinized more than normal, so I got linker failures complaining about lack of builtin_va_arg_pack_len, builtin_va_arg_pack, and friends. This heightened level of checking (I'm assuming, like said, because it's now private instead of available_externally) may make compat between clang<->GCC libs more difficult.

Will look into ways to solve the linkage problem tomorrow.

Quuxplusone commented 9 years ago

Sorry Reid, I misinterpreted what you said to mean that Evgeniy's work was something done a while ago. My mistake. :)

Looking at his patch (http://reviews.llvm.org/D12087), it doesn't solve this issue out-of-the-box, but it does get us most of the way there. Once his patch lands, I'll work on getting a fix for this out.

Quuxplusone commented 9 years ago
(In reply to comment #22)
> Sorry Reid, I misinterpreted what you said to mean that Evgeniy's work was
> something done a while ago. My mistake. :)
>
> Looking at his patch (http://reviews.llvm.org/D12087), it doesn't solve this
> issue out-of-the-box, but it does get us most of the way there. Once his
> patch lands, I'll work on getting a fix for this out.

Sounds good. :)