We should ensure our tools (analyzer, analysis server, dart2js, cfe) have very good benchmark coverage for all metrics we care about and continuously track those metrics on golem.
[ ] Ensure we have JIT & AOT benchmarks: JIT & AOT have different perf characteristics. We may gradually move tools to run in AOT for end-users due to better startup/memory/no-app-jit-training/... properties.
=> Currently we mostly measure JIT on our tools.
=> e.g. cider/ runs analyzer in AOT-compressed-pointer mode which we don't track.
[ ] Ensure we measure both memory & performance on same run: We want to know how CLs affect both - to notice if we e.g. trade increased memory for better perf, ...
=> Currently e.g. CFE golem benchmarks doesn't seem to measure memory.
[ ] Run on representative set of input
=> Current e.g. golem analyzer benchmarks don't run flutter even though most end-users are flutter developers.
[ ] Prefer non-moving benchmarks: We want to know how perf/memory develops over time which is best if the input to the tool is fixed (i.e. not HEAD of a git repo). For inputs like flutter we may want to update the revision after a while, which should result in one jump in the graph (i.e. update should happen in a git commit - e.g. by updating DEPS or other file).
[ ] Ensure the benchmarks produce reasonable stable numbers to ensure we notice (& get notified) when regressions happen. We may want to track e.g. p50/p90/p99 latencies instead of a single high-variance number.
=> Currently e.g. analyzer completion/edit benchmarks have very high variance. We may not notice / get notified on regressions.
We should arrive in a place where one can patch-upload CLs to benchmarking system and have full picture before landing CLs.
/cc @sigmundch for dart2js
/cc @bwilkerson @scheglov for analyzer
/cc @jensjoha for cfe & analyzer
We should ensure our tools (analyzer, analysis server, dart2js, cfe) have very good benchmark coverage for all metrics we care about and continuously track those metrics on golem.
HEAD
of a git repo). For inputs like flutter we may want to update the revision after a while, which should result in one jump in the graph (i.e. update should happen in a git commit - e.g. by updatingDEPS
or other file).We should arrive in a place where one can patch-upload CLs to benchmarking system and have full picture before landing CLs.
/cc @sigmundch for dart2js /cc @bwilkerson @scheglov for analyzer /cc @jensjoha for cfe & analyzer