Target data layout / triple being ignored?

tmcdonell commented 10 years ago

I'm having trouble getting code to vectorize. I think I've narrowed it down to the data layout not being passed correctly (or at the right point) to the optimization passes.

Here's a test program: https://gist.github.com/tmcdonell/9199137 https://gist.github.com/tmcdonell/9198757

(It's a bit verbose, as I tried to replicate the passes used by the command line opt as closely as possible.)

The crux is that even though I have the data layout specified in dotp.ll, the debug output still shows:

LV: We can vectorize this loop!
LV: Found trip count: 0
LV: The Widest type: 64 bits.
LV: The Widest register is: 32 bits.
LV: The target has no vector registers.

While running through opt on the command line gives the expected

LV: The Widest register is: 128 bits.
LV: The target has 16 vector registers

It is possible to force it to work by specifying -force-vector-width=X on the command line (Test.hs line 41), but that's not going to work in general, or for mixed data types, etc...

Any suggestions?

tmcdonell commented 10 years ago

It could also be that the target triple field is being ignored. Thus, it is not specifying the TTI information for the target processor (that has vector registers) and is just using a basic model. I guess this corresponds to the passes -basictti -x86tti that I see when dumping the passes with opt.

This email thread seems relevant: http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-October/066841.html

bscarlet commented 10 years ago

Off the top of my head, my best suggestion is somewhat generic: That's a long list of optimizations you've got there. Can you boil down the test case to a single pass - the first one that does something different than in opt, with corresponding input (from the prior pass)?

tmcdonell commented 10 years ago

The long list of passes is really just replicating opt -O3. Somehow the curated set does not include the LoopVectorize pass, but I guess that is a separate issue... Anyway, you can just run that pass by itself afterwards:

main :: IO ()
main = do
  parseCommandLineOptions
    [ "-debug"
--    , "-force-vector-width=2"
--    , "-mtriple=x86_64-apple-macosx10.9.0"
--    , "-mcpu=corei7-avx"
    , "-debug-pass=Arguments"
    , "-debug-only=loop-vectorize"
    ]
    Nothing

  mt <- readFile "dotp.ll"
  r  <- withContext $ \cx ->
    runErrorT $ withModuleFromLLVMAssembly cx mt $ \mdl -> do
    runErrorT $ withDefaultTargetMachine $ \machine -> do
      triple <- getProcessTargetTriple
      withTargetLibraryInfo triple $ \libinfo -> do

        datalayout  <- getTargetMachineDataLayout machine

        let
--            p1 = PassSetSpec prepass (Just datalayout) (Just libinfo) (Just machine)
--            p2 = PassSetSpec optpass (Just datalayout) (Just libinfo) (Just machine)
            p3 = defaultCuratedPassSetSpec { optLevel = Just 3 }
            p4 = PassSetSpec [LoopVectorize] (Just datalayout) (Just libinfo) (Just machine)

--        _b1 <- withPassManager p1 $ \pm -> runPassManager pm mdl
--        _b2 <- withPassManager p2 $ \pm -> runPassManager pm mdl
        _b3 <- withPassManager p3 $ \pm -> runPassManager pm mdl
        _b4 <- withPassManager p4 $ \pm -> runPassManager pm mdl

        moduleLLVMAssembly mdl >>= putStrLn

  either (either error (putStrLn . diagnosticDisplay))
         (either error return)
         r

Really, however, the first difference is that llvm-general isn't adding any data layout or target type information, which happens as one of the first analysis passes before any of the transformation passes are invoked. You can see this in the -debug-pass=Arguments output:

opt: -targetlibinfo -datalayout -notti -basictti -x86tti -no-aa -tbaa -basicaa -globalopt -ipsccp ... llvm-general: -no-aa -tbaa -targetlibinfo -basicaa -notti -globalopt -ipsccp ...

So we're missing data layout and target information from -datalayout -basictti -x86tti, and I think that lack of information is what is hampering the later transformations.

bscarlet commented 10 years ago

You've hit one constraint that I'll need to fix in your second example, which is that the CuratedPassSetSpec convenience interface doesn't give you a way to specify a DataLayout or a TargetMachine.

In your original case, though, you're not using that interface. In that code, you're using default values in your Haskell code. Do you know if the default machine, triple and datalayout you're using correspond to what opt winds up using when you get the vectorization to work?

(Please note that the Haskell PassManager interface is not trying to be as smart as the whole opt binary - it's just exposing the llvm PassManager interface. Any of the logic that's in opt for picking good values for those parameters is something you'd need to do yourself in Haskell. Any obstacle in the API for plumbing them in so they get used I'd be happy to fix, of course.)

tmcdonell commented 10 years ago

When I run opt on the command line I don't specify any -march or -mcpu; it only uses the info it has available from the data layout and target triple fields of the .ll file.

Running:

$ opt-3.4 -O3 -S dotp.ll -debug-only=loop-vectorize

If I comment out the data layout field from dotp.ll (leaving the target triple in place) I get:

LV: Not vectorizing because of missing data layout

Whereas if I comment out only the target triple, we get the same as before:

LV: We can vectorize this loop! LV: Found trip count: 0 LV: The Widest type: 64 bits. LV: The Widest register is: 32 bits. LV: The target has no vector registers.

Actually, these values I get from Haskell are very slightly different from what Clang uses (good catch!) but opt still has enough information (x86-64 target and vector register width) and vectorizes the loop exactly the same if I use these values in dotp.ll instead.

ghci> getDefaultTargetTriple
"x86_64-apple-darwin13.0.0"

ghci> either error (putStrLn . dataLayoutToString) =<< (runErrorT $ withDefaultTargetMachine getTargetMachineDataLayout)
e-S128-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-v64:64:64-v128:128:128-f16:16:16-f32:32:32-f64:64:64-f80:128:128-f128:128:128-a0:0:64-s0:64:64-n8:16:32:64

It'd be great if llvm-general also just took the data layout and target from the .ll, when it is available. That way, the CuratedPassSetSpec interface doesn't need to be changed (indeed, maybe the lower level interface should grab the data from the .ll file as well, if Nothing is given in the PassSetSpec?). My initial use of the lower level PassSetSpec was an attempt to explicitly provide this information and mimic the logic of opt as closely as possible, which ultimately didn't work either.

bscarlet commented 10 years ago

llvm-general aims at generality - hence the name. In that pursuit, it follows a fairly strict heuristic design principle of avoiding adding its own behaviors above and beyond what the underlying LLVM libraries themselves expose, lest my own sense of convenience or beauty lead me - in the blindness of my individual perspective - to erect an obstacle between a user and the capabilities of LLVM. For that reason I will not add any magic defaulting behavior to the lower level interface.

The higher level interface is another story. It's an attempt to expose the PassManagerBuilder in the C++ API, itself an encapsulation of the list of passes which used to be hardcoded in the code for the "opt" binary. Your example clearly shows a deficiency in llvm-general there, as there's no way to specify a DataLayout in that case at all.

I'm still not clear on your first case though. When you use the low level interface and specify an appropriate DataLayout, TargetLibraryInfo and TargetMachine, does the vectorization kick in for you, or not?

tmcdonell commented 10 years ago

Vectorisation does not kick in when using the lower level interface. The data layout and target triple provided explicitly to PassSetSpec are as above.

As seen with -debug-pass=Arguments, no target info is ever added.

bscarlet commented 10 years ago

Okay. Looks like that problem is probably a missing call to have the TargetMachine add TargetTransformInfo passes in LLVM.General.Internal.PassManager.createPassManager, right after the call to FFI.addDataLayoutPass.

This issue'll be high in my queue, but may still take a bit for me to get to. Pull requests welcome, of course. A regression test for test/LLVM/General/Test/Optimization.hs would be a good idea, too, if you care to boil your test case down further.

tmcdonell commented 10 years ago

Awesome! Keep me updated (:

tvh commented 10 years ago

I also had trouble with vectorization. I think I fixed part of the problem, but it still wont work if no TargetMachine is provided

tmcdonell commented 10 years ago

@tvh I haven't managed to get llvm-3.5 installed on my machine. Does your patch also work for you if you apply it to v3.4.1.0? Also, does it work with the test program I had above, or did you have to do something extra? Thanks!

tmcdonell commented 10 years ago

Update: If I cherry pick to 3.4.1.0, it installs but fails when I try to use it later on:

dyld: lazy symbol binding failed: Symbol not found: __ZN4llvm14TargetRegistry12lookupTargetERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEERNS_6TripleERS7_
  Referenced from: /Users/tmcdonell/Projects/accelerate/accelerate-llvm/.cabal-sandbox/lib/x86_64-osx-ghc-7.8.0.20140228/llvm-general-3.4.1.0/libHSllvm-general-3.4.1.0-ghc7.8.0.20140228.dylib
  Expected in: flat namespace

(running the test program from above)

tmcdonell commented 10 years ago

Sorry, never mind the last, I had hosed my cabal sandbox somehow.

bscarlet commented 10 years ago

This should be fixed and available in 3.3.11.0 and 3.4.2.0.

tmcdonell commented 10 years ago

Awesome, thanks (:

bscarlet / llvm-general

Target data layout / triple being ignored? #91