Tiny benchmarking harness

martinothamar commented 2 weeks ago

Description

A benchmarking harness using tinybench as a suggestion to start tracking perf numbers. Using benchmarking provides us a paper trail - when we discover slow code that affect real usecases, we can provide benchmarks to show improvements in isolation. Upon future changes we can verify change in perf characteristics (for a variety of inputs) such as

Throughput
Latency
Std deviation (I'm not sure if it's possible to see the whole distribution, e.g. to rule out bimodal distributions)

Memory allocations is missing, I couldn't find a way to correctly correlate heap changes to benchmark runs, as there might be GCs occurring in the middle of runs.

The included benchmark is a microbenchmark comparing some implementations of splitDashedKey that did pop up in some profiling (although this is not an issue anymore, so it serves mostly as an example).

Related Issue(s)

N/A

Verification/QA

Manual functionality testing
- [ ] I have tested these changes manually
- [ ] Creator of the original issue (or service owner) has been contacted for manual testing (or will be contacted when released in alpha)
- [x] No testing done/necessary
Automated tests
- [ ] Unit test(s) have been added/updated
- [ ] Cypress E2E test(s) have been added/updated
- [x] No automatic tests are needed here (no functional changes/additions)
- [ ] I want someone to help me make some tests
UU/WCAG (follow these guidelines until we have our own)
- [ ] I have tested with a screen reader/keyboard navigation/automated wcag validator
- [x] No testing done/necessary (no DOM/visual changes)
- [ ] I want someone to help me perform accessibility testing
User documentation @ altinn-studio-docs
- [ ] Has been added/updated
- [x] No functionality has been changed/added, so no documentation is needed
- [ ] I will do that later/have created an issue
Support in Altinn Studio
- [ ] Issue(s) created for support in Studio
- [x] This change/feature does not require any changes to Altinn Studio
Sprint board
- [x] The original issue (or this PR itself) has been added to the Team Apps project and to the current sprint board
- [ ] I don't have permissions to do that, please help me out
Labels
- [ ] I have added a kind/* label to this PR for proper release notes grouping
- [ ] I don't have permissions to add labels, please help me out

sonarcloud[bot] commented 2 weeks ago

Quality Gate failed

Failed conditions
1 Security Hotspot
7.9% Coverage on New Code (required ≥ 45%)
0.0% Condition Coverage on New Code (required ≥ 45%)

See analysis details on SonarCloud

adamhaeger commented 1 week ago

Hmm I dont really see how tinybench is useful for real world performance testing of a webapp. Seems more focused on low level testing of functions.

Id say some of the metrics we should focus on are:

Component Render Times
- Initial Render Time
- Re-render Times
- Wasted Renders
React Reconciliation (Diffing Algorithm) Efficiency
State and Prop Update Times
Time to First Paint (TTFP) and First Contentful Paint (FCP)
Memory Usage
Time to Interactive (TTI)
Largest Contentful Paint (LCP)
Layout Shift (Cumulative Layout Shift - CLS)
Network Request Time for Data Fetching
JavaScript Bundle Size and Code Splitting

Tools to Measure These Metrics

React DevTools Profiler
Chrome DevTools Performance Tab
Web Vitals
Lighthouse

We could for example set up lighthouse CI which provides many of these metrics: https://github.com/GoogleChrome/lighthouse-ci

Could fail the build if its too slow.

martinothamar commented 1 week ago

Seems more focused on low level testing of functions.

Agreed, and I think we need both. We need high level testing that uses metrics focused on UX, but during the optimization process of our code we also need benchmarking that verifies that we actually optimize.

The tools you mention are good for the explorative/analysis phase where you are running one-offs and high-level before/after. Benchmarks are good for iterating on optimizations when you've chosed your place of optimization. Proper measurement of optimization requires a lot of runs/executions so that the distribution of latency can be inspected. Higher level tests where you measure metrics to capture UX quality like the ones you mention can't be run a ton of times while also having a fast feedback loop

Altinn / app-frontend-react