Open jberryman opened 5 years ago
I don't have an experience with GHC, and I don't have an access to version 8.4.3. However, I tried 7.6.3 with the "Hello World" program, and although BOLT processes the binary, the resulting executable segfaults. It appears there are multiple custom control transfer tables that are embedded into the code, and they are unmarked in the symbol table. stg_ap_p_fast
is once instance of a function that uses such tables. Again, I'm unfamiliar with GHC, and cannot know what is the purpose of this function.
In general, since GHC uses non-standard code sequences involving indirect jumps, it will require a special support in BOLT. How important is this for you? Do you have a large performance-critical application written in Haskell?
Cool, I appreciate you trying it out!
I'm out of my depth here but certainly not surprised that BOLT doesn't work on GHC code as I know it's not a platform you test against.
How important is this for you? Do you have a large performance-critical application written in Haskell?
Difficult question! I'm trying out BOLT because I'm curious and wanted to see what would happen and if I could write a blog post about it (though I do write a lot of haskell for a living, some of which needs to be tuned or carefully tested for performance).
But I have real reasons to be interested in BOLT (or similar tools... I don't honestly have a great sense of what BOLT does besides reorder blocks for better locality) vis a vis GHC/haskell:
If you are asking whether I'd be interested in contributing time or money the answer is I probably don't have the knowledge to do that at this point. I might be able to help coordinate or shepherd a GHC ticket, if something could be done on that end
I see. From what I can tell there's a lot performance to be gained from improving the existing backend/runtime. BOLT can still help if the resulting code is large and causes a significant amount of instruction cache and TLB misses. On Linux you can measure those running the application under perf -e instructions,L1-icache-misses -- <your app with options>
. If you end up with under 5 misses per a thousand instructions, then it's better to look at other options to improve the code. Compute-heavy applications, for example, spend most of their time in loops and are not great candidates for code layout optimizations provided by BOLT.
I've just started experimenting with BOLT and was curious if it would work on a haskell binary. I'm building a hello world program with ghc 8.4.3, with
ghc --make hello.hs -o hello
.Using this oneliner to do the perf/BOLT stuff, straight from README:
$ function bolt_stuff(){ perf record -e cycles:u -j any,u -o $1.data -- ./$1 ; perf2bolt -p $1.data -o perf.fdata $1 ; llvm-bolt $1 -o $1.bolt -data=perf.fdata -reorder-blocks=cache+ -reorder-functions=hfsort+ -split-functions=3 -split-all-cold -split-eh -dyno-stats }
Getting this error from
llvm-bolt
I get basically identical errors when compiled with the LLVM backend (with
ghc --make -fllvm...
).Let me know if I can help debug.