awslabs / llrt

LLRT (Low Latency Runtime) is an experimental, lightweight JavaScript runtime designed to address the growing demand for fast and efficient Serverless applications.
Apache License 2.0
8.09k stars 358 forks source link

Profile-Guided Optimization (PGO) benchmarks #117

Open zamazan4ik opened 8 months ago

zamazan4ik commented 8 months ago

Hi!

I tried to apply Profile-Guided Optimization (PGO) to optimize llrt performance further (as I already did for many other projects - see all current results here). I performed some basic benchmarks and want to share the results here.

Test environment

Benchmark

As a benchmark, I use the same command as I found in the Makefile - llrt fixtures/hello.js. The same scenario is used for the PGO training phase. All PGO optimization steps are done with cargo-pgo tool. PGO instrumented version is built with cargo pgo build, PGO optimized version - cargo pgo optimize build. taskset -c 0 is used for reducing CPU scheduling influence on the results.

Results

I got the following results:

hyperfine -u microsecond -N --warmup=2000 --min-runs 10000 "taskset -c 0 ./llrt_optimized ../fixtures/hello.js" "taskset -c 0 ./llrt_release ../fixtures/hello.js"
Benchmark 1: taskset -c 0 ./llrt_optimized ../fixtures/hello.js
  Time (mean ± σ):     2664.8 µs ±  78.8 µs    [User: 590.1 µs, System: 1943.3 µs]
  Range (min … max):   2478.1 µs … 4486.1 µs    10000 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: taskset -c 0 ./llrt_release ../fixtures/hello.js
  Time (mean ± σ):     2796.1 µs ±  63.6 µs    [User: 601.4 µs, System: 2068.9 µs]
  Range (min … max):   2647.5 µs … 4495.0 µs    10000 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  taskset -c 0 ./llrt_optimized ../fixtures/hello.js ran
    1.05 ± 0.04 times faster than taskset -c 0 ./llrt_release ../fixtures/hello.js

, where llrt_release - usual Release version, llrt_optimized - PGO-optimized version.

I ran the benchmark multiple times, with different command orders, etc - in all cases, the PGO-optimized version was faster than the usual release version. However, it would be awesome to perform some more precise benchmarks.

Further steps

I can suggest to do the following things:

I hope these benchmark results can be interesting to someone.

richarddavison commented 8 months ago

This is very interesting! I will rerun the benchmark with PGO (with profile data form test runs) and see the results! PLO is also super interesting but is a different beast! Right now, we use zig as a cross compiler. Since LLRT is a fully static build using musl libc, we can probably use musl sources and clang-15 directly (since it may come with bolt) and apply both PGO, PLO and LTO 🥇