GaloisInc / llvm-pretty-bc-parser

Parser for the llvm bitcode format
Other
60 stars 6 forks source link

There's something funny about the store instruction... #118

Closed langston-barrett closed 5 years ago

langston-barrett commented 5 years ago

While experimenting with #117 and running the bitcode parser on a large C++ codebase, I noticed that there's no consistent way to typecheck the plain store instruction. In particular, if I uncomment this line:

    -- Typecheck the instruction
    -- typecheckLoadStoreInst val ptr

then I see a lot of these messages:

Load/store type does not patch type of pointer.
Pointer type: PtrTo (PtrTo (Alias (Ident "class.testing::TestWithParam")))
Value type:   PtrTo (Alias (Ident "class.testing::TestWithParam"))
Pointer value: ValIdent (Ident "2")
Value value:   ValIdent (Ident "0")

from:
        FUNC_CODE_INST_STORE
        @_ZN7testing13TestWithParamIN4fizz4test10HashParamsEED2Ev
        FUNCTION_BLOCK
        FUNCTION_BLOCK_ID
        value symbol table
        MODULE_BLOCK
        Bitstream

However, if I change that line to the following (which wraps the value type in a PtrTo before comparing, fixing the above message(s)):

    when (PtrTo (typedType val) /= typedType ptr) $ fail $ unlines
      [ "Store type does not patch type of pointer."
      , "Pointer type: "  ++ show (typedType ptr)
      , "Value type:   "  ++ show (typedType val)
      , "Pointer value: " ++ show (typedValue ptr)
      , "Value value:   " ++ show (typedValue val)
      ]

Then I get:

Store type does not patch type of pointer.                                                                                                                   
Pointer type: PtrTo (PrimType (Integer 64))
Value type:   PtrTo (PrimType (Integer 64))
Pointer value: ValIdent (Ident "4")
Value value:   ValIdent (Ident "8")

from:
        FUNC_CODE_INST_STORE
        @_ZNSt3__134__libcpp_atomic_refcount_incrementIlEET_RS1_
        FUNCTION_BLOCK
        FUNCTION_BLOCK_ID
        value symbol table
        MODULE_BLOCK
        Bitstream

It appears that the store instruction receives arguments of inconsistent types. Is this really right? Or does it reveal a weird inconsistency in our parsing, e.g. in the value table?

langston-barrett commented 5 years ago

All the libc++ files fail with standard (typecheckLoadStoreInst) typechecking.

Load/store type does not patch type of pointer.
Pointer type: PtrTo (PtrTo (Alias (Ident "class.std::\__1::collate")))
Value type:   PtrTo (Alias (Ident "class.std::\__1::collate"))
Pointer value: ValIdent (Ident "3")
Value value:   ValIdent (Ident "0")

from:
        FUNC_CODE_INST_STORE
        @_ZNSt3\__17collateIcEC2Em
        FUNCTION_BLOCK
        FUNCTION_BLOCK_ID
        value symbol table
        MODULE_BLOCK
        Bitstream

The relevant instruction:

store %"class.std::__1::collate"* %0, %"class.std::__1::collate"** %3, align 8, !tbaa !6431

With this check:

when (PtrTo (typedType val) /= typedType ptr) $ fail $ unlines
  [ "Store value type does not patch type of pointer."
  , "Pointer type: "  ++ show (typedType ptr)
  , "Value type:   "  ++ show (typedType val)
  , "Pointer value: " ++ show (typedValue ptr)
  , "Value value:   " ++ show (typedValue val)
  ]

Several (but fewer!) still fail:

Store value type does not patch type of pointer.
Pointer type: PtrTo (PrimType (Integer 32))
Value type:   PtrTo (PrimType (Integer 32))
Pointer value: ValIdent (Ident "6")
Value value:   ValIdent (Ident "5")

from:
        FUNC_CODE_INST_STORE
        @_ZNSt3\__16locale2id6\__initEv
        FUNCTION_BLOCK
        FUNCTION_BLOCK_ID
        value symbol table
        MODULE_BLOCK
        Bitstream

(Note that the printed types match in the above messages, but we add a PtrTo in the when clause, making them mismatched.)

store i32 %5, i32* %6, align 8, !dbg !32060, !tbaa !31907

It remains to see whether there is a difference at the level of the bitcode (using llvm-bcanalyzer).

Here are the relevant files: locale.zip

langston-barrett commented 5 years ago

And now, more detail than is healthy: the LLVM source and llvm-bcanalyzer dump for the above functions:

; Function Attrs: alwaysinline sspstrong uwtable
define weak_odr hidden void @_ZNSt3__17collateIcEC2Em(%"class.std::__1::collate"*, i64) unnamed_addr #0 comdat($_ZNSt3__17collateIcEC5Em) align 2 !dbg !6427 {
  %3 = alloca %"class.std::__1::collate"*, align 8
  %4 = alloca i64, align 8
  store %"class.std::__1::collate"* %0, %"class.std::__1::collate"** %3, align 8, !tbaa !6431
  call void @llvm.dbg.declare(metadata %"class.std::__1::collate"** %3, metadata !6429, metadata !DIExpression()), !dbg !6435
  store i64 %1, i64* %4, align 8, !tbaa !6436
  call void @llvm.dbg.declare(metadata i64* %4, metadata !6430, metadata !DIExpression()), !dbg !6438
  %5 = load %"class.std::__1::collate"*, %"class.std::__1::collate"** %3, align 8
  %6 = bitcast %"class.std::__1::collate"* %5 to %"class.std::__1::locale::facet"*, !dbg !6439
  %7 = load i64, i64* %4, align 8, !dbg !6440, !tbaa !6436
  call void @_ZNSt3__16locale5facetC2Em(%"class.std::__1::locale::facet"* %6, i64 %7), !dbg !6441
  %8 = bitcast %"class.std::__1::collate"* %5 to i32 (...)***, !dbg !6439
  store i32 (...)** bitcast (i8** getelementptr inbounds ({ [8 x i8*] }, { [8 x i8*] }* @_ZTVNSt3__17collateIcEE, i32 0, inrange i32 0, i32 2) to i32 (...)**), i32 (...)*** %8, align 8, !dbg !6439, !tbaa !6442
  ret void, !dbg !6444
}
<FUNCTION_BLOCK NumWords=93 BlockCodeSize=4>
  <DECLAREBLOCKS op0=1/>
  <CONSTANTS_BLOCK NumWords=6 BlockCodeSize=4>
    <SETTYPE abbrevid=4 op0=75/>
    <UnknownCode24 op0=3 op1=3 op2=4 op3=0 op4=7 op5=2892 op6=7 op7=2892 op8=7 op9=2896/>
    <SETTYPE abbrevid=4 op0=80/>
    <CE_CAST abbrevid=6 op0=11 op1=75 op2=3723/>
  </CONSTANTS_BLOCK>
  <METADATA_BLOCK NumWords=29 BlockCodeSize=3>
    <STRINGS abbrevid=4 op0=1 op1=4/> num-strings = 1 {
      '_ZNSt3__17collateIcEC2Em'
    }
    <SUBPROGRAM op0=3 op1=3479 op2=82 op3=11980 op4=3829 op5=226 op6=8024 op7=0 op8=1 op9=227 op10=0 op11=0 op12=0 op13=256 op14=1 op15=3688 op16=0 op17=8025 op18=11984 op19=0 op20=0/>
    <LOCAL_VAR op0=2 op1=11981 op2=2070 op3=0 op4=0 op5=4803 op6=1 op7=1088 op8=0/>
    <LOCAL_VAR op0=2 op1=11981 op2=2511 op3=3829 op4=226 op5=3991 op6=2 op7=0 op8=0/>
    <NODE op0=11982 op1=11983/>
    <VALUE op0=2523 op1=3725/>
    <VALUE op0=71 op1=3726/>
  </METADATA_BLOCK>
  <INST_ALLOCA op0=235 op1=7 op2=2894 op3=68/>
  <INST_ALLOCA op0=5 op1=7 op2=2894 op3=68/>
    <!-- The bad instruction: -->
  <INST_STORE op0=2 op1=6 op2=4 op3=0/>
  <INST_CALL op0=0 op1=32768 op2=239 op3=3182 op4=4294959039 op5=4294959042 op6=4294962580/>
  <DEBUG_LOC op0=0 op1=0 op2=11981 op3=0/>
  <INST_STORE op0=1 op1=5 op2=4 op3=0/>
  <INST_CALL op0=0 op1=32768 op2=239 op3=3182 op4=4294959038 op5=4294959041 op6=4294962580/>
  <DEBUG_LOC op0=226 op1=29 op2=11981 op3=0/>
  <INST_LOAD abbrevid=4 op0=2 op1=235 op2=4 op3=0/>
  <INST_CAST abbrevid=7 op0=1 op1=83 op2=11/>
  <DEBUG_LOC op0=227 op1=33 op2=11981 op3=0/>
  <INST_LOAD abbrevid=4 op0=3 op1=5 op2=4 op3=0/>
  <DEBUG_LOC op0=227 op1=25 op2=11981 op3=0/>
  <INST_CALL op0=0 op1=32768 op2=241 op3=3184 op4=2 op5=1/>
  <DEBUG_LOC op0=227 op1=11 op2=11981 op3=0/>
  <INST_CAST abbrevid=7 op0=3 op1=2524 op2=11/>
  <DEBUG_LOC op0=227 op1=33 op2=11981 op3=0/>
  <INST_STORE op0=1 op1=7 op2=4 op3=0/>
  <DEBUG_LOC_AGAIN/>
  <INST_RET abbrevid=8/>
  <DEBUG_LOC op0=227 op1=34 op2=11981 op3=0/>
  <METADATA_ATTACHMENT_BLOCK NumWords=7 BlockCodeSize=3>
    <ATTACHMENT op0=0 op1=11980/>
    <ATTACHMENT op0=2 op1=1 op2=9851/>
    <ATTACHMENT op0=4 op1=1 op2=9853/>
    <ATTACHMENT op0=8 op1=1 op2=9853/>
    <ATTACHMENT op0=11 op1=1 op2=9855/>
  </METADATA_ATTACHMENT_BLOCK>
</FUNCTION_BLOCK>
; Function Attrs: nounwind sspstrong uwtable
define void @_ZNSt3__16locale2id6__initEv(%"class.std::__1::locale::id"*) #3 align 2 !dbg !32054 {
  %2 = alloca %"class.std::__1::locale::id"*, align 8
  store %"class.std::__1::locale::id"* %0, %"class.std::__1::locale::id"** %2, align 8, !tbaa !6431
  call void @llvm.dbg.declare(metadata %"class.std::__1::locale::id"** %2, metadata !32056, metadata !DIExpression()), !dbg !32057
  %3 = load %"class.std::__1::locale::id"*, %"class.std::__1::locale::id"** %2, align 8
  %4 = atomicrmw add i32* @_ZNSt3__16locale2id9__next_idE, i32 1 seq_cst, !dbg !32058
  %5 = add i32 %4, 1, !dbg !32058
  %6 = getelementptr inbounds %"class.std::__1::locale::id", %"class.std::__1::locale::id"* %3, i32 0, i32 1, !dbg !32059
  store i32 %5, i32* %6, align 8, !dbg !32060, !tbaa !31907
  ret void, !dbg !32061
}
<FUNCTION_BLOCK NumWords=53 BlockCodeSize=4>
  <DECLAREBLOCKS op0=1/>
  <METADATA_BLOCK NumWords=14 BlockCodeSize=3>
    <SUBPROGRAM op0=3 op1=3333 op2=124 op3=125 op4=3825 op5=668 op6=3956 op7=0 op8=1 op9=669 op10=0 op11=0 op12=0 op13=256 op14=1 op15=3688 op16=0 op17=3958 op18=11982 op19=0 op20=0/>
    <LOCAL_VAR op0=2 op1=11980 op2=2070 op3=0 op4=0 op5=6636 op6=1 op7=1088 op8=0/>
    <NODE op0=11981/>
    <VALUE op0=2826 op1=3722/>
  </METADATA_BLOCK>
  <INST_ALLOCA op0=1416 op1=7 op2=2894 op3=68/>
  <INST_STORE op0=1 op1=2 op2=4 op3=0/>
  <INST_CALL op0=0 op1=32768 op2=239 op3=3178 op4=4294959037 op5=4294959039 op6=4294962576/>
  <DEBUG_LOC op0=0 op1=0 op2=11980 op3=0/>
  <INST_LOAD abbrevid=4 op0=1 op1=1416 op2=4 op3=0/>
  <UnknownCode38 op0=3599 op1=830 op2=1 op3=0 op4=6 op5=1/>
  <DEBUG_LOC op0=670 op1=13 op2=11980 op3=0/>
  <INST_BINOP abbrevid=5 op0=1 op1=831 op2=0/>
  <DEBUG_LOC_AGAIN/>
  <INST_GEP abbrevid=11 op0=1 op1=1415 op2=3 op3=834 op4=832/>
  <DEBUG_LOC op0=670 op1=5 op2=11980 op3=0/>
  <!-- The bad instruction: -->
  <INST_STORE op0=1 op1=2 op2=4 op3=0/>
  <DEBUG_LOC op0=670 op1=11 op2=11980 op3=0/>
  <INST_RET abbrevid=8/>
  <DEBUG_LOC op0=671 op1=1 op2=11980 op3=0/>
  <METADATA_ATTACHMENT_BLOCK NumWords=5 BlockCodeSize=3>
    <ATTACHMENT op0=0 op1=11979/>
    <ATTACHMENT op0=1 op1=1 op2=9851/>
    <ATTACHMENT op0=7 op1=1 op2=10861/>
  </METADATA_ATTACHMENT_BLOCK>
</FUNCTION_BLOCK>
langston-barrett commented 5 years ago

I get identical behavior with LLVM/Clang 5 and 7 (the above is with LLVM/Clang 6), and with optimizations disabled -O0