Closed alpmestan closed 2 months ago
Thanks for the report. As noted in our README, LLVM instructions and syntax can change with different versions, and indeed this seems to have been a change that started in LLVM 14 and was finalized in LLVM17.
I embedded your sample in the following test harness (targeting my local machine's triple, although that's just for lli
needs and should not be relevant to this issue):
{-# LANGUAGE OverloadedStrings #-}
import qualified Text.LLVM as L
import qualified Text.LLVM.Triple.AST as L3
import qualified Text.LLVM.PP as LP
import Data.Int
import Text.Read ( readMaybe )
import System.Environment ( getArgs )
mod1 :: L.LLVM (L.Typed L.Value)
mod1 = do
printf <- L.declare (L.iT 32) "printf" [L.ptrT (L.iT 8)] True
fmtStr <- L.string "d_fmt" "%lf\n\0"
L.define L.emptyFunAttrs (L.iT 32) "main" () $ do
_ <- L.call printf [fmtStr, doubleT L.-: L.toValue (pi :: Double)]
L.ret (L.iT 32 L.-: (0 :: Int32))
doubleT :: L.Type
doubleT = L.PrimType $ L.FloatType L.Double
main = do args <- getArgs
let llvmver = if length args < 1
then LP.llvmVlatest
else case (readMaybe $ args !! 0) :: Maybe Int of
Just n -> n
Nothing -> LP.llvmVlatest
let tgtTriple = L3.TargetTriple
{
L3.ttArch = L3.X86_64
, L3.ttSubArch = mempty
, L3.ttVendor = L3.PC
, L3.ttOS = L3.Linux
, L3.ttEnv = L3.GNU
, L3.ttObjFmt = L3.ELF
}
let llvmMod = (snd $ L.runLLVM mod1) { L.modTriple = tgtTriple }
-- putStrLn $ show llvmMod
putStrLn $ "---- for LLVM " <> show llvmver
let ll = show $ LP.ppLLVM llvmver $ LP.llvmPP llvmMod
-- putStrLn ll
writeFile "calltest.ll" ll
putStrLn "wrote calltest.ll"
This matches your first form (no trailing *
for the call fnty argument) and I get the following results:
$ for X in $(seq 12 18) ; do echo LLVM$X:: ; runhaskell calltest.hs ; nix shell nixpkgs#llvm_$X -c lli alp.ll ; done
LLVM12::
---- for LLVM 12
wrote calltest.ll
lli: calltest.ll:5:29: error: '@printf' defined with type 'i32 (i8*, ...)*' but expected 'i32 (i8*, ..
.)* ([5 x i8]*, double)*'
%r0 = call i32(i8*, ...)* @printf([5 x i8]* @d_fmt,
^
LLVM13::
---- for LLVM 13
wrote calltest.ll
lli: lli: calltest.ll:5:29: error: '@printf' defined with type 'i32 (i8*, ...)*' but expected 'i32 (i8
*, ...)* ([5 x i8]*, double)*'
%r0 = call i32(i8*, ...)* @printf([5 x i8]* @d_fmt,
^
LLVM14::
---- for LLVM 14
wrote calltest.ll
lli: lli: calltest.ll:5:29: error: '@printf' defined with type 'i32 (i8*, ...)*' but expected 'i32 (i8
*, ...)* ([5 x i8]*, double)*'
%r0 = call i32(i8*, ...)* @printf([5 x i8]* @d_fmt,
^
LLVM15::
---- for LLVM 15
wrote calltest.ll
lli: lli: calltest.ll:5:29: error: '@printf' defined with type 'i32 (i8*, ...)*' but expected 'i32 (i8
*, ...)* ([5 x i8]*, double)*'
%r0 = call i32(i8*, ...)* @printf([5 x i8]* @d_fmt,
^
LLVM16::
---- for LLVM 16
wrote calltest.ll
3.141593
LLVM17::
---- for LLVM 17
wrote calltest.ll
0.000000
LLVM18::
---- for LLVM 18
wrote calltest.ll
0.000000
$
If I then modify the code to remove the trailing *
as you did, I get the following:
$ for X in $(seq 16 18) ; do echo LLVM$X:: ; runhaskell calltest.hs ; nix shell nixpkgs#llvm_$X -c lli calltest.ll ; done
LLVM16::
---- for LLVM 16
wrote calltest.ll
3.141593
LLVM17::
---- for LLVM 17
wrote calltest.ll
3.141593
LLVM18::
---- for LLVM 18
wrote calltest.ll
3.141593
$
And indeed, in https://releases.llvm.org/17.0.1/docs/LangRef.html#call-instruction the examples dropped the *
, but there's no corresponding description changes for the call
instruction. The *
is also not present in LLVM 16 docs (https://releases.llvm.org/16.0.0/docs/LangRef.html#call-instruction), where both forms seem to work, or the LLVM 15 docs, but it is present in LLVM 14 (https://releases.llvm.org/14.0.0/docs/LangRef.html#call-instruction).
I suspect that the removal of the pointer indirection for function types corresponds to the shift to Opaque Pointers in LLVM, where pointers no longer carry type information (https://llvm.org/devmtg/2022-04-03/slides/keynote.Opaque.Pointers.Are.Coming.pdf).
At this point, I think the right path forward is to update the type returned by decFunType
(in Text.LLVM.AST) to remove the PtrTo
indirection... but under what circumstances? The caller would need to indicate the desired LLVM version at AST construction time rather than pretty-printing time, or else we need a more flexible internal representation of these types that allows for version-specific handling at the right points.
For the moment, if you are targeting LLVM 17 (or potentially LLVM 15+), you can use your kall
override or modify the way you declare
the "printf" function, although that might have other effects.
Yeah for my use case I'm most definitely happy to just run with kall
, no worries there, especially given it's (for now) a personal project.
I'm sorry this sent you down some very deep rabbit hole, but am very thankful for the detailed exploration.
At this point, I think the right path forward is to update the type returned by decFunType (in Text.LLVM.AST) to remove the PtrTo indirection... but under what circumstances? The caller would need to indicate the desired LLVM version at AST construction time rather than pretty-printing time, or else we need a more flexible internal representation of these types that allows for version-specific handling at the right points.
Yeah the fact that "the right (LLVM) type" is dependent on the LLVM version is quite messed up. But it also seems tricky to just hand the problem over to all users either (including future me who may have forgotten about this in 6 months), as many will likely not know about those changes.
For what it's worth, my personal preference probably goes to allowing the library be a bit closer to "what the user thinks and means" when it comes to codegen. Most of the time we're just declaring, defining and calling things, so I'd probably be happy if the library could be made a bit more flexible so as to easily encode what I want while retaining all the info it needs to pretty-print the right incantation later, once the LLVM version is known.
(The first solution sounds to me like it may be a bit too confusing for users. "What do you mean I need to give the LLVM version to this function, in the middle of the codegen logic?")
I disagree that we need to do something differently based on the LLVM version here. Rather, I think using call i32(i8*, ...)* @printf(...)
was always an error. As evidence, note that if you compile this program with any version of Clang:
#include <stdio.h>
int main(void) {
printf("%d\n", 42);
return 0;
}
Then you'll get something that looks like:
$ ~/Software/clang+llvm-10.0.0/bin/clang -emit-llvm -S test.c -o test.ll
ryanscott at pop-os in ~/Documents/Hacking/C
$ cat test.ll
<snip>
; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 @main() #0 {
%1 = alloca i32, align 4
store i32 0, i32* %1, align 4
%2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i32 42)
ret i32 0
}
Note that call i32 (i8*, ...) @printf(...)
might be call i32 (ptr, ...) @printf(...)
if you are using an LLVM version that supports opaque pointers, but that's not particularly important here. What is important is that regardless of what LLVM version you use, the argument type will not be (i8*, ...)*
(or ptr
). If I try to change call i32 (i8*, ...) @printf(...)
to call i32 (i8*, ...)* @printf(...)
as re-assemble the program, I get the following error:
$ ~/Software/clang+llvm-10.0.0/bin/llvm-as test.ll -o test.bc
/home/ryanscott/Software/clang+llvm-10.0.0/bin/llvm-as: test.ll:12:29: error: invalid forward reference to function 'printf' with wrong type: expected 'i32 (i8*, ...)*' but was 'i32 (i8*, ...)* (i8*, i32)*'
%2 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i32 42)
^
Therefore, I think the inclusion of the extra pointer type in the list of examples in https://releases.llvm.org/14.0.0/docs/LangRef.html#call-instruction was simply an oversight (that has since been fixed). As such, I think we should just remove the trailing *
from the list of argument types.
The documentation shows the *
back through at least v2.6, so it's been there for quite some time, but it's also just the documentation. I was concerned that the pointer version may have been the only acceptable form for an older LLVM, but I also verified that back through at least LLVM 5 it supports the non-pointer form, so I'd agree we don't need to make this version specific and can just update call
per the original report.
Patch in #145 - review/feedback welcome :)
Fixed in #145.
Hello,
First off, thanks a lot for this package.
I stumbled upon some odd behaviour (possibly related to https://github.com/GaloisInc/llvm-pretty/issues/102#issuecomment-1478073111 ?). When I try to call a function that I either
declare
d ordefine
d in the sameLLVM
monad computation, using theText.LLVM.call
function, the function type is printed with an excessive pointer, see below.This results in:
Notice the
*
at the end ofi32(i8*, ...)*
, which shouldn't be there. This IR, when executed withlli
:If I edit the IR manually to remove that extra pointer in the type, and try to run again:
I tracked down the issue to the definition of
call
:If I instead define and use the following function in place of
Text.LLVM.call
:then I get the correct IR (the one in
test2.ll
) and some digits ofpi
get printed, like withtest2.ll
.Am I somehow misusing
define
/declare
/call
or shouldcall
be patched?Cheers