Closed tmcdonell closed 10 years ago
It could also be that the target triple field is being ignored. Thus, it is not specifying the TTI information for the target processor (that has vector registers) and is just using a basic model. I guess this corresponds to the passes -basictti -x86tti
that I see when dumping the passes with opt
.
This email thread seems relevant: http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-October/066841.html
Off the top of my head, my best suggestion is somewhat generic: That's a long list of optimizations you've got there. Can you boil down the test case to a single pass - the first one that does something different than in opt, with corresponding input (from the prior pass)?
The long list of passes is really just replicating opt -O3
. Somehow the curated set does not include the LoopVectorize
pass, but I guess that is a separate issue... Anyway, you can just run that pass by itself afterwards:
main :: IO ()
main = do
parseCommandLineOptions
[ "-debug"
-- , "-force-vector-width=2"
-- , "-mtriple=x86_64-apple-macosx10.9.0"
-- , "-mcpu=corei7-avx"
, "-debug-pass=Arguments"
, "-debug-only=loop-vectorize"
]
Nothing
mt <- readFile "dotp.ll"
r <- withContext $ \cx ->
runErrorT $ withModuleFromLLVMAssembly cx mt $ \mdl -> do
runErrorT $ withDefaultTargetMachine $ \machine -> do
triple <- getProcessTargetTriple
withTargetLibraryInfo triple $ \libinfo -> do
datalayout <- getTargetMachineDataLayout machine
let
-- p1 = PassSetSpec prepass (Just datalayout) (Just libinfo) (Just machine)
-- p2 = PassSetSpec optpass (Just datalayout) (Just libinfo) (Just machine)
p3 = defaultCuratedPassSetSpec { optLevel = Just 3 }
p4 = PassSetSpec [LoopVectorize] (Just datalayout) (Just libinfo) (Just machine)
-- _b1 <- withPassManager p1 $ \pm -> runPassManager pm mdl
-- _b2 <- withPassManager p2 $ \pm -> runPassManager pm mdl
_b3 <- withPassManager p3 $ \pm -> runPassManager pm mdl
_b4 <- withPassManager p4 $ \pm -> runPassManager pm mdl
moduleLLVMAssembly mdl >>= putStrLn
either (either error (putStrLn . diagnosticDisplay))
(either error return)
r
Really, however, the first difference is that llvm-general isn't adding any data layout or target type information, which happens as one of the first analysis passes before any of the transformation passes are invoked. You can see this in the -debug-pass=Arguments
output:
opt: -targetlibinfo -datalayout -notti -basictti -x86tti -no-aa -tbaa -basicaa -globalopt -ipsccp ...
llvm-general: -no-aa -tbaa -targetlibinfo -basicaa -notti -globalopt -ipsccp ...
So we're missing data layout and target information from -datalayout -basictti -x86tti
, and I think that lack of information is what is hampering the later transformations.
You've hit one constraint that I'll need to fix in your second example, which is that the CuratedPassSetSpec convenience interface doesn't give you a way to specify a DataLayout or a TargetMachine.
In your original case, though, you're not using that interface. In that code, you're using default values in your Haskell code. Do you know if the default machine, triple and datalayout you're using correspond to what opt winds up using when you get the vectorization to work?
(Please note that the Haskell PassManager interface is not trying to be as smart as the whole opt binary - it's just exposing the llvm PassManager interface. Any of the logic that's in opt for picking good values for those parameters is something you'd need to do yourself in Haskell. Any obstacle in the API for plumbing them in so they get used I'd be happy to fix, of course.)
When I run opt
on the command line I don't specify any -march
or -mcpu
; it only uses the info it has available from the data layout and target triple fields of the .ll file.
Running:
$ opt-3.4 -O3 -S dotp.ll -debug-only=loop-vectorize
If I comment out the data layout field from dotp.ll
(leaving the target triple in place) I get:
LV: Not vectorizing because of missing data layout
Whereas if I comment out only the target triple, we get the same as before:
LV: We can vectorize this loop! LV: Found trip count: 0 LV: The Widest type: 64 bits. LV: The Widest register is: 32 bits. LV: The target has no vector registers.
Actually, these values I get from Haskell are very slightly different from what Clang uses (good catch!) but opt still has enough information (x86-64 target and vector register width) and vectorizes the loop exactly the same if I use these values in dotp.ll
instead.
ghci> getDefaultTargetTriple
"x86_64-apple-darwin13.0.0"
ghci> either error (putStrLn . dataLayoutToString) =<< (runErrorT $ withDefaultTargetMachine getTargetMachineDataLayout)
e-S128-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-v64:64:64-v128:128:128-f16:16:16-f32:32:32-f64:64:64-f80:128:128-f128:128:128-a0:0:64-s0:64:64-n8:16:32:64
It'd be great if llvm-general also just took the data layout and target from the .ll, when it is available. That way, the CuratedPassSetSpec
interface doesn't need to be changed (indeed, maybe the lower level interface should grab the data from the .ll file as well, if Nothing
is given in the PassSetSpec
?). My initial use of the lower level PassSetSpec
was an attempt to explicitly provide this information and mimic the logic of opt
as closely as possible, which ultimately didn't work either.
llvm-general aims at generality - hence the name. In that pursuit, it follows a fairly strict heuristic design principle of avoiding adding its own behaviors above and beyond what the underlying LLVM libraries themselves expose, lest my own sense of convenience or beauty lead me - in the blindness of my individual perspective - to erect an obstacle between a user and the capabilities of LLVM. For that reason I will not add any magic defaulting behavior to the lower level interface.
The higher level interface is another story. It's an attempt to expose the PassManagerBuilder in the C++ API, itself an encapsulation of the list of passes which used to be hardcoded in the code for the "opt" binary. Your example clearly shows a deficiency in llvm-general there, as there's no way to specify a DataLayout in that case at all.
I'm still not clear on your first case though. When you use the low level interface and specify an appropriate DataLayout, TargetLibraryInfo and TargetMachine, does the vectorization kick in for you, or not?
Vectorisation does not kick in when using the lower level interface. The data layout and target triple provided explicitly to PassSetSpec
are as above.
As seen with -debug-pass=Arguments
, no target info is ever added.
Okay. Looks like that problem is probably a missing call to have the TargetMachine add TargetTransformInfo passes in LLVM.General.Internal.PassManager.createPassManager, right after the call to FFI.addDataLayoutPass.
This issue'll be high in my queue, but may still take a bit for me to get to. Pull requests welcome, of course. A regression test for test/LLVM/General/Test/Optimization.hs would be a good idea, too, if you care to boil your test case down further.
Awesome! Keep me updated (:
I also had trouble with vectorization. I think I fixed part of the problem, but it still wont work if no TargetMachine is provided
@tvh I haven't managed to get llvm-3.5 installed on my machine. Does your patch also work for you if you apply it to v3.4.1.0? Also, does it work with the test program I had above, or did you have to do something extra? Thanks!
Update: If I cherry pick to 3.4.1.0, it installs but fails when I try to use it later on:
dyld: lazy symbol binding failed: Symbol not found: __ZN4llvm14TargetRegistry12lookupTargetERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEERNS_6TripleERS7_
Referenced from: /Users/tmcdonell/Projects/accelerate/accelerate-llvm/.cabal-sandbox/lib/x86_64-osx-ghc-7.8.0.20140228/llvm-general-3.4.1.0/libHSllvm-general-3.4.1.0-ghc7.8.0.20140228.dylib
Expected in: flat namespace
(running the test program from above)
Sorry, never mind the last, I had hosed my cabal sandbox somehow.
This should be fixed and available in 3.3.11.0 and 3.4.2.0.
Awesome, thanks (:
I'm having trouble getting code to vectorize. I think I've narrowed it down to the data layout not being passed correctly (or at the right point) to the optimization passes.
Here's a test program: https://gist.github.com/tmcdonell/9199137 https://gist.github.com/tmcdonell/9198757
(It's a bit verbose, as I tried to replicate the passes used by the command line
opt
as closely as possible.)The crux is that even though I have the data layout specified in
dotp.ll
, the debug output still shows:While running through
opt
on the command line gives the expectedIt is possible to force it to work by specifying
-force-vector-width=X
on the command line (Test.hs line 41), but that's not going to work in general, or for mixed data types, etc...Any suggestions?