bscarlet / llvm-general

Rich LLVM bindings for Haskell (with transfer of LLVM IR to and from C++, detailed compilation pass control, etc.)
http://hackage.haskell.org/package/llvm-general
132 stars 38 forks source link

moduleAST -> withModuleFromAST fails #100

Closed FreakyPenguin closed 10 years ago

FreakyPenguin commented 10 years ago

Hi, I ran into the following problem with version 3.4.2.2:

Basically I'm trying to get the pure AST from the impure representation, do some modifications and get it back into the internal representation. When trying to do this I noticed that there seems to be something wrong in either moduleAST or withModuleFromAST, even when I made no modifications to the AST the latter failed. But the bitcode itself (it's from clang), seems to be correct since executing it (using MCJIT) works just fine if I use the originial module that comes from withModuleFromBitcode.

I tried to reduce my example as much as possible, and added the Haskell source and the LLVM-source below. Any idea why this is happening or how to fix it? The error produced is test-llvm: user error (reference to undefined local: Name "cache.07")

Haskell Source:

import qualified LLVM.General.PrettyPrint as LLVMPP
import qualified LLVM.General.Context as LLVMCtx
import qualified LLVM.General.Module as LLVMMod

import Control.Monad
import Control.Monad.Error

main =
    LLVMCtx.withContext $ \ctx ->
        liftError $ LLVMMod.withModuleFromBitcode ctx file $ \mod -> do
            ast <- LLVMMod.moduleAST mod
            putStrLn $ LLVMPP.showPretty ast
            liftError $ LLVMMod.withModuleFromAST ctx ast$ \mod -> do
                putStrLn "Success!"

    where
        file = LLVMMod.File "Rx-linked-cg.bc"

liftError :: ErrorT String IO a -> IO a
liftError = runErrorT >=> either fail return

LLVM Source: (for Rx-linked-cg.bc)

; ModuleID = 'Rx'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%struct.pipeline_handle = type opaque
%struct.queue_handle = type opaque
%struct.input = type { i8*, %struct.input_attributes*, i64, i16, i16, i16, %struct.input*, i8*, i8* }
%struct.input_attributes = type { i16, i16, i16, i16, i64, i64, i16, i16, i32, i32, i64, i64, i32, i32, i8, i32, i16, i16, i32 }
%struct.state = type { i32, i64, %struct.arp_pending*, %struct.arp_cache*, i64, %struct.driver*, %struct.tap_handler* }
%struct.arp_pending = type { i32, %struct.input*, %struct.arp_pending* }
%struct.arp_cache = type { i32, i64, %struct.arp_cache* }
%struct.driver = type { i8*, i8* (i8*)*, i16 (i8*, i8*, i16)*, i32 (i8*, i8*, i16)*, i64 (i8*)*, i32 (i8*)* }
%struct.tap_handler = type { i32, i32, [16 x i8] }

; Function Attrs: nounwind readonly uwtable
define %struct.arp_cache* @arp_cache_lookup(%struct.state* nocapture readonly %st, i32 %ip) #0 {
entry:
  %arp_cache = getelementptr inbounds %struct.state* %st, i64 0, i32 3
  %cache.05 = load %struct.arp_cache** %arp_cache, align 8
  %cmp6 = icmp eq %struct.arp_cache* %cache.05, null
  br i1 %cmp6, label %return, label %while.body

while.cond:                                       ; preds = %while.body
  %next = getelementptr inbounds %struct.arp_cache* %cache.07, i64 0, i32 2
  %cache.0 = load %struct.arp_cache** %next, align 8
  %cmp = icmp eq %struct.arp_cache* %cache.0, null
  br i1 %cmp, label %return, label %while.body

while.body:                                       ; preds = %while.cond, %entry
  %cache.07 = phi %struct.arp_cache* [ %cache.0, %while.cond ], [ %cache.05, %entry ]
  %ip1 = getelementptr inbounds %struct.arp_cache* %cache.07, i64 0, i32 0
  %0 = load i32* %ip1, align 4
  %cmp2 = icmp eq i32 %0, %ip
  br i1 %cmp2, label %return, label %while.cond

return:                                           ; preds = %while.body, %while.cond, %entry
  %retval.0 = phi %struct.arp_cache* [ null, %entry ], [ null, %while.cond ], [ %cache.07, %while.body ]
  ret %struct.arp_cache* %retval.0
}

Thanks, Antoine

bscarlet commented 10 years ago

This is a bug in my handling of forward references when lowering the pure AST to the C++ objects. The resolution may be a little tricky, because my choice to leave explicit (and usually redundant) type information out of symbol references leaves me in a bit of a bind. I've got to think for a bit to see if I can dig myself out of this problem without a radical change to the AST.

bscarlet commented 10 years ago

Nope. Bummer.

define void @foo() {
entry:
  ret void

dead0:                                            ; preds = %dead1
  %x0 = add i32 %x1, %x1
  br label %dead1

dead1:                                            ; preds = %dead0
  %x1 = add i32 %x0, %x0
  br label %dead0
}

is legal, but in my AST too many types would be elided and the type of %x0 and %x1 wouldn't be determined - so I've got to add types to a lot of stuff. It'll fix the bug, but the consequences for downstream users won't be pretty. I don't seem to have much of a choice, though.

cartazio commented 10 years ago

would one alternative fix be to have a "type unification" pass in the pure ast -> c++ direction? (note that i don't have any large code that'll be impacted by this, merely wondering outloud)

dfoxfranke commented 10 years ago

Can you explain what changes to the AST you have in mind? It sounds like you're saying that Names all need to have type annotations attached to them, but that doesn't make sense to me. I can't see why it would ever be necessary for the abstract syntax to contain a type annotation anywhere that the concrete syntax doesn't. If the AST needs to be elaborated with further type information prior to converting it to the c++ representation, that elaboration could go in a separate, user-hidden pass like cartazio suggested.

dfoxfranke commented 10 years ago

Nevermind, I see now. Those i32s annotating the add instructons in the concrete syntax don't show up anywhere in the abstract syntax, and without those the AST is genuinely ambiguous. Yes, that is awfully unfortunate.

bscarlet commented 10 years ago

After some examination, the annotations on all the instructions in the concrete syntax all seem to be small local inferences away from simply having a type on each reference to a variable (e.g. the add instruction only writes the operand type once, rather than once for each operand, because the two operands must have the same type).

The change/fix that is the best compromise I've found between consistency with the concrete syntax and general cleanliness and internal consistency is to add a type field to both the LocalReference constructor for Operand and the GlobalReference constructor for Constant.

bscarlet commented 10 years ago

This is now fixed, with associated necessary AST changes adding type annotations to symbol references, is in 3.3.12.0 and 3.4.3.0.