Improve configuration of native image compilation

apple / pkl

A configuration as code language with rich validation and tooling.

https://pkl-lang.org

Apache License 2.0

9.85k stars 258 forks source link

Improve configuration of native image compilation #455

Closed translatenix closed 1 month ago

translatenix commented 2 months ago

Solve msgpack issue with --initialize-at-run-time.
Use quick build mode for 40% faster compilation and 20% smaller executable (tested on Linux amd64). If necessary, this can be disabled for release builds. Note that with the move from GraalVM CE 22 to Oracle GraalVM 23, native image compilation time has doubled and binary size has increased to 120 MiB on Linux amd64.
Use Intel Skylake (2015) CPU features for amd64 executables.
Remove options that are commented out.

translatenix commented 2 months ago

How much perf do we gain from targeting skylake? Can you enable the native CLI builds to CI so we can compare it to the default architecture? Or, if you're on an x86 machine, feel free to just run some tests locally and report the results.

Does Pkl have a comprehensive benchmark suite? If you want proof, that’s probably the only way.

These CLIs are libraries, and have a broad set of use-cases. For that reason, I'm thinking that we should probably be conservative here. We probably shouldn't just bump the version, but if there's a significant perf gain, we can consider adding this as an additional architecture.

Your call. I feel that “skylake” is fairly conservative yet enables most CPU features. “x64-v3” is an arbitrary GraalVM default—we should figure out what’s right for Pkl. Also, this is easy to undo depending on feedback.

bioball commented 2 months ago

Does Pkl have a comprehensive benchmark suite? If you want proof, that’s probably the only way.

Not yet. We have some very lightweight benchmark tests, but we don't have an environment (yet) that can run workloads with bare metal isolation. Until then, the best we can do is run benchmarks locally then compare.

translatenix commented 2 months ago

Until then, the best we can do is run benchmarks locally then compare

But what benchmarks do you run? ./gradlew pkl-core:testLinuxExecutableAmd64 doesn’t strike me as a meaningful benchmark.

bioball commented 2 months ago

There is an in-language stdlib module that we use to write one-off benchmarks when we want to test specific things when iterating locally.

There's also the bench:jmh task, although we haven't been using that nearly as much.

The suggestion of running testLinuxExecutableAmd64 is just a coarse-grained way to get some signal, but definitely not a real benchmark.

bioball commented 2 months ago

Here's a quick benchmark that might be a good starting point for comparison. Save this to a file somewhere, then pkl eval <file>.pkl to execute the benchmark.

amends "pkl:Benchmark"

import "package://pkg.pkl-lang.org/pkl-pantry/org.json_schema.contrib@1.0.5#/internal/ModulesGenerator.pkl"
import "package://pkg.pkl-lang.org/pkl-pantry/org.json_schema@1.0.2#/Parser.pkl"
import "package://pkg.pkl-lang.org/pkl-pantry/pkl.experimental.uri@1.0.1#/URI.pkl"

local githubActionJson = read("https://json.schemastore.org/github-action.json")

microbenchmarks {
  ["json schema generator"] {
    expression = 
      let (parsed = Parser.parse(githubActionJson))
        new ModulesGenerator {
          rootSchema = parsed
          baseUri = URI.parse("file:///foo")!!
        }
          .modules
          .map((it) -> it.moduleNode.output.text)
  }
}

translatenix commented 2 months ago

Here's a quick benchmark that might be a good starting point for comparison.

On my laptop, the variance of this benchmark is too high to draw any conclusions about-march settings. However, I can tell that quick build mode significantly degrades performance, which isn't surprising.

-march=x86-64-v3 requires Intel Haswell (2013) or AMD Excavator (2015), which isn't far off from -march=skylake. To move this PR forward, I'll remove -march=skylake.

translatenix commented 2 months ago

The one remaining request is to undo the change to --add-opens here.

I’m still working on that. I’m not convinced that --add-opens is required. After all, it isn’t required when running on the JVM.

translatenix commented 2 months ago

Ready to review from my side. See commit messages for details.

The "Improve BinaryEvaluatorSnippetTests(Engine)" refactoring isn't essential because I ultimately decided to add NativeServerTest instead of NativeBinaryEvaluatorSnippetTests. But I think it's a worthwhile improvement regardless.

bioball commented 2 months ago

Look like you're right; --add-opens is not needed :D