Yamato-Security / hayabusa

Hayabusa (隼) is a sigma-based threat hunting and fast forensics timeline generator for Windows event logs.
GNU Affero General Public License v3.0
2.32k stars 203 forks source link

Create PGO optimized binaries #1469

Open YamatoSecurity opened 2 weeks ago

YamatoSecurity commented 2 weeks ago

I was able to get a 11.5% speed increase with PGO optimization. (Memory usage did not change)

cargo install cargo-pgo
rustup component add llvm-tools-preview
cargo pgo build
cargo pgo instrument --keep-profiles run csv-timeline -d ../hayabusa-sample-evtx -w -D -n -x -X -s -o delete.csv -C -p super-verbose
cargo pgo instrument --keep-profiles run json-timeline -d ../hayabusa-sample-evtx -w -D -n -x -X -s -o delete.csv -C -p super-verbose
cargo pgo instrument --keep-profiles run logon-summary -d ../hayabusa-sample-evtx  -o delete.csv -C
cargo pgo instrument --keep-profiles run eid-metrics -d ../hayabusa-sample-evtx -o delete.csv -C
cargo pgo optimize

Notes

  1. The optimized binary gets outputted at target/x86_64-apple-darwin/release/, etc... directories instead of ./target/release
  2. The profiles will differ based on workload, so we should probably be running the hayabusa commands against the hayabusa-sample-evtx files as well as the evtx-baseline files
  3. I still haven't tried optimization with BOLT although apparently rustc already does some optimizations so might not be necessary.
  4. The profiles are not getting created on Windows for some reason...

Info:

@fukusuket Whenever you have time, could you test to see if you get faster benchmarks as well? If so, I'd like to add to clone the release binary automation action and add PGO optimizations to it.

fukusuket commented 2 weeks ago

@YamatoSecurity I ran csv-timeline on a binary that was optimized for hayabusa-sample-evtx using the following procedure! (Because I wanted to check with --release build)

But I could not see any speed improvement when I performed the above steps🤔 Could you confirm the speed improvement with the --release build? (Is it possible to do a --release build with cargo-pgo?)

YamatoSecurity commented 2 weeks ago

I don't think it is possible to combine --release build and cargo-pgo. I compared binaries of ones built with --release build and one built with cargo-pgo. I don't think the hayabusa-sample-evtx logs are good to benchmark with because the data is so small we are going to get randomly a few seconds different anyways. You probably need to test with evtx-baseline and other logs as well if possible. I tested on 14GB.

fukusuket commented 2 weeks ago

@YamatoSecurity I see... Indeed, hayabusa-sample-evtx is not suitable. I'll try it with baseline-evtx(and cargo-pgo)!

YamatoSecurity commented 2 weeks ago

Humm.. on Windows, it does not create any profiles.. I tried to use the profiles I created on my mac but I get a profile uses zlib compression but the profile reader was built without zlib support error.

I did find out that you can stop the clearing of PGO profiles with the --keep-profiles flag so that will make things simplier without having to copy out the profiles and move them back.

Reference: https://kobzol.github.io/rust/cargo/2023/07/28/rust-cargo-pgo.html

Other references: https://doc.rust-lang.org/rustc/profile-guided-optimization.html

YamatoSecurity commented 2 weeks ago

I updated the commands to run at the top of the issue. With the same commands it creates the PGO profiles on linux but not on Windows... I can't find information googling it and all the examples seem to be for *nix systems so maybe it is not working now on Windows.

YamatoSecurity commented 2 weeks ago

I was able to use PGO optimization on Windows by specifying the .profdata file I created on my Mac by copying it over and setting the environment variable $env:RUSTFLAGS="-C profile-use = d:¥yamatosecurity¥hayabusa¥pgo-profiles.profdata" and then run cargo build --release as normal. So when doing this as an action, we can create the PGO binary on *nix, run various hayabusa commands, create the .profdata file and then use that .profdata file for each time we build the binary for the various OSes. We could save the .profdata file to the repository if need be.

YamatoSecurity commented 2 weeks ago

When testing really big data (130GB) I only got a 3% speed up on Windows. 5 hours 31 minutes to 5 hours 20 minutes but still an improvement. 😄