Open Ptival opened 1 year ago
Yes, this is rather unfortunate. Before I discuss a possible fix for this, let me first provide some context as to how things became this way (feel free to skip to the bottom of this comment if you already know the story).
The root of the issue is that, broadly speaking, there two situations where llvm-pretty-bc-parser
must fill in Type
s in the AST:
store
instruction has a type argument that represents the type of memory to store, but the store
instruction doesn't directly encode the type of pointer being written to.(2) is the tricky case, since it's up to the LLVM parsing library to determine what type should go in the AST. Continuing the store
example, let's suppose that we have:
store i8 %0, ptr %1
Or, to use the pre-opaque pointer syntax:
store i8 %0, i8* %1
The bitcode for this store
instruction will directly encode the i8
type, which is easy to parse (this is a type-(1) situation). It will not encode the ptr
/i8*
type, however (this is a type-(2) situation). llvm-pretty-bc-parser
now has a choice to make: should it fill in ptr
in the AST, or should it fill in i8*
?
If we aimed to only support one particular version of LLVM, then this would be an easy choice, as we would simply pick ptr
for recent LLVM versions and i8*
for older LLVM versions. But llvm-pretty-bc-parser
has made the somewhat unusual choice of supporting multiple versions of the LLVM language simultaneously. There are good reasons for doing so (there are some projects where we still need to use non-opaque pointers), so when I upgraded llvm-pretty-bc-parser
to support opaque pointers, I decided to have the library pick i8*
in this example. This isn't an entirely arbitrary choice, since i8*
has more type information than ptr
, and i8*
is supported by more LLVM versions than ptr
.
For tools like Crucible, the consequences of this choice aren't that important, since it can handle both i8*
and ptr
without issue, and it can even handle programs that contain both. For tools like llvm-as
, however, mixing the two pointer types is problematic, as it requires that a given program either use exclusively opaque pointers or exclusively non-opaque pointers. llvm-pretty-bc-parser
's own test suite uses llvm-as
, so I cooked up the fixupOpaquePtrs
function to sanitize files with mixed pointer types and make them all use opaque ones. This is ultimately a hack, however, and it is one that we should strive to remove.
At the end of the day, producing ASTs with mixed opaque and non-opaque pointer types is asking for trouble, so I think we should exclusively use one or the other. Unfortunately, different LLVM versions only support certain pointer types, so we must be careful about which pointer type we pick.
One possible solution would be to tweak llvm-pretty-bc-parser
's behavior based on what version of LLVM it is parsing. I'm not terribly fond of this idea, since the LLVM bitcode format is intended to be (largely) backwards-compatible, and we really ought to be able to parse bitcode without having to know the version number.
Another possible way forward would be to add a configuration option that allows switching between one pointer type or the other. I'm slightly more fond of this idea, although we'd have to figure out what the right default setting should be.
I think I have another case of needing more info:
@global_var = external constant [1 x i8]
define i64 @h() {
br i1 icmp ne (i32 ptrtoint (ptr @global_var to i32), i32 1), label %pc_1, label %pc_1
pc_1:
ret i64 0
}
llvm-as -> .bc -> llvm-disasm:
; reduced2.ll.bc
source_filename = "reduced2.ll"
target triple = "unknown-unknown-unknown-unknown-"
@global_var = external constant [1 x i8]
define i64 @h() {
; <label>: 0
br i1 icmp ne ([1 x i8]* @global_var, i32 1), label %pc_1, label %pc_1
pc_1:
ret i64 0
}
This is rejected by llvm-as:
llvm-as: reduced2.ll:6:9: error: compare operands must have the same type
br i1 icmp ne ([1 x i8]* @global_var, i32 1), label %pc_1, label %pc_1
^
llvm-as -> .bc -> llvm-dis:
; ModuleID = 'reduced2.ll.bc'
source_filename = "reduced2.ll"
@global_var = external constant [1 x i8]
define i64 @h() {
br i1 icmp ne (i32 ptrtoint (ptr @global_var to i32), i32 1), label %pc_1, label %pc_1
pc_1: ; preds = %0, %0
ret i64 0
}
There is no ptrtoint in the AST dump. I haven't analyzed the bitcode directly yet.
I think that this is actually a separate issue, as it's possible to trigger this bug without the use of opaque pointers. See #266.
Encountered some incorrect roundtripping. Here is a fairly minimal reproducer:
Assembling to bitcode, parsing with llvm-pretty-bc-parser and then pretty-printing with ppModule yields:
which contains invalid LLVM IR
ptr*
.Thanks to @kquick I now know I can avoid this problem by manually calling
fixupOpaquePtrs
, however, I wonder whether we should avoid producingPtrTo PtrOpaque
in the first place, on the fly?