GaloisInc / llvm-pretty-bc-parser

Parser for the llvm bitcode format
Other
60 stars 6 forks source link

`call`/`callbr`'s function type should not be a pointer type #189

Closed RyanGlScott closed 1 year ago

RyanGlScott commented 2 years ago

If you parse an invoke instruction, such as these one found in crux-llvm's invoke-test.ll test case, the resulting IR will be:

...
  (Invoke
     (FunTy
        (PrimType (Integer 32))
        [ PtrTo (PtrTo (PrimType (Integer 8))) ]
        False)
     (ValSymbol
        (Symbol
           "_ZN3std2rt10lang_start28_$u7b$$u7b$closure$u7d$$u7d$17h6e1c450892b1f754E"))
     [ Typed
         { typedType = PtrTo (PtrTo (PrimType (Integer 8)))
         , typedValue = ValIdent (Ident "arg0")
         }
     ]
     (Named (Ident "bb1"))
     (Named (Ident "cleanup")))
...

The thing that is peculiar about this IR is that the argument type is a raw FunTy. Reading the LLVM Language Reference Manual entry for invoke, however, suggests that this should instead be a PtrTo:

Syntax:

<result> = invoke [cconv] [ret attrs] [addrspace(<num>)] <ty>|<fnty> <fnptrval>(<function args>) [fn attrs]
              [operand bundles] to label <normal label> unwind label <exception label>

...

Arguments:

This instruction requires several arguments:

...

  1. fnptrval: An LLVM value containing a pointer to a function to be invoked. In most cases, this is a direct function invocation, but indirect invoke’s are just as possible, calling an arbitrary pointer to function value.

...

Making this a PtrTo would also match the treatment for call and callbr, two closely related instructions. In fact, in the source of of LLVM, the classes for call, callbr, and invoke are all subclasses of the same CallBase class, so they all store their function type in the same way.

RyanGlScott commented 1 year ago

After some further investigation (prompted by the opaque pointer switchover—see #177), I've actually come to the opposite conclusion from this issue's title. Namely, invoke should not have a pointer type, and the fact that we give call and callbr pointer types is wrong. The thing that threw me off was the LLVM Language Reference Manual's description for the call instruction:

<result> = [tail | musttail | notail ] call [fast-math flags] [cconv] [ret attrs] [addrspace(<num>)]
           <ty>|<fnty> <fnptrval>(<function args>) [fn attrs] [ operand bundles ]

I had mistakenly assumed that fnptrval must have the same type as fnty, but this is not the case! As an experiment, consider this program, which has been carefully written to ensure that the compiled bitcode will contain a call instruction containing a fnty:

#include <stdio.h>

void f() {
  int (*p)(const char*, ...) = &printf;
  p("%d\n", 0);
}

What should the fnty be when we invoke call on p? I had assumed that it would be a pointer type, but in fact, it is a raw function type:

define dso_local void @f() #0 !dbg !17 {
entry:
  %p = alloca ptr, align 8
  call void @llvm.dbg.declare(metadata ptr %p, metadata !21, metadata !DIExpression()), !dbg !28
  store ptr @printf, ptr %p, align 8, !dbg !28
  %0 = load ptr, ptr %p, align 8, !dbg !29
  %call = call i32 (ptr, ...) %0(ptr noundef @.str, i32 noundef 0), !dbg !29
  ret void, !dbg !30
}

In particular, i32 (ptr, ...) is a function type representing p's function signature: it takes a pointer as its first argument, zero or more variadic arguments afterwards, and returns an i32. (If we disable opaque pointers, then the type would be i32 (i8*, ...).) The type of %0 is a pointer to a function of type i32 (ptr, ...), but the fnty itself is not a pointer type.

llvm-pretty-bc-parser (as well as downstream libraries like crucible-llvm) have propped up the mistaken belief that the fnty in a call/callbr should be a pointer to a function type, rather than a direct function type. We have managed to get away with this up until now, but this will cause issues when transitioning over to opaque pointers. This is because there are some operations that extract the pointee type of a pointer, which is not possible to do with an opaque pointer. See, for instance, this code in crucible-llvm.

Long story short: in order to support opaque pointers in the future, we will need to change the fnty convention used within llvm-pretty-bc-parser and crucible-llvm. Unless I am missing something, this change can be done indepedently of implementing opaque pointers themselves.