Instrumentation produces false measurements for NodeJS app using NAPI

overlookmotel commented 4 months ago

I'm trying to benchmark a NodeJS app which calls into a NodeJS native module (built from Rust via napi-rs).

CodSpeed's instrumentation appears to slow down the app very considerably, making it hard to accurately measure performance.

To get a fair comparison, I'm running the same benchmark on Github Actions with and without CodSpeed.

Without CodSpeed:

Running benchmarks with CodSpeed DISABLED
┌─────────┬────────────────────────────────────────────┬─────────┬────────────────────┬──────────┬─────────┐
│ (index) │ Task Name                                  │ ops/sec │ Average Time (ns)  │ Margin   │ Samples │
├─────────┼────────────────────────────────────────────┼─────────┼────────────────────┼──────────┼─────────┤
│ 0       │ 'parser(napi)[checker.ts]'                 │ '4'     │ 238333877.50000006 │ '±4.73%' │ 10      │
│ 1       │ 'parser(napi)[cal.com.tsx]'                │ '8'     │ 118107410.40000008 │ '±1.21%' │ 10      │
│ 2       │ 'parser(napi)[RadixUIAdoptionSection.jsx]' │ '4,068' │ 245803.32628994508 │ '±0.68%' │ 2035    │
│ 3       │ 'parser(napi)[pdf.mjs]'                    │ '13'    │ 75663396.4999999   │ '±4.10%' │ 10      │
│ 4       │ 'parser(napi)[antd.js]'                    │ '1'     │ 545470710.6000001  │ '±3.60%' │ 10      │
└─────────┴────────────────────────────────────────────┴─────────┴────────────────────┴──────────┴─────────┘

With CodSpeed:

Running benchmarks with CodSpeed ENABLED
┌─────────┬────────────────────────────────────────────┬─────────┬────────────────────┬──────────┬─────────┐
│ (index) │ Task Name                                  │ ops/sec │ Average Time (ns)  │ Margin   │ Samples │
├─────────┼────────────────────────────────────────────┼─────────┼────────────────────┼──────────┼─────────┤
│ 0       │ 'parser(napi)[checker.ts]'                 │ '0'     │ 1676626826.099999  │ '±0.53%' │ 10      │
│ 1       │ 'parser(napi)[cal.com.tsx]'                │ '0'     │ 1028924682.7999946 │ '±0.45%' │ 10      │
│ 2       │ 'parser(napi)[RadixUIAdoptionSection.jsx]' │ '618'   │ 1615957.9129023065 │ '±0.81%' │ 310     │
│ 3       │ 'parser(napi)[pdf.mjs]'                    │ '1'     │ 568098107.800007   │ '±5.44%' │ 10      │
│ 4       │ 'parser(napi)[antd.js]'                    │ '0'     │ 3961387613.2999954 │ '±0.92%' │ 10      │
└─────────┴────────────────────────────────────────────┴─────────┴────────────────────┴──────────┴─────────┘

Github Action results: https://github.com/oxc-project/oxc/actions/runs/8308255059 Benchmark task: .github/workflows/benchmark.yml Benchmark code: napi/parser/parse.bench.mjs

NB: The 2nd set of results are not from within the withCodspeed()-wrapped Bench object, because CodSpeeds's wrapper disables output of the results table. It appears that CodSpeed's instrumentation slows down everything in the process, not just the code which runs within the withCodspeed() wrapper.

These benchmarks use tinybench. I have also tried Vitest and benchmark.js, but they produce similar results.

Worse, CodSpeed's instrumentation appears to be preventing accurate assessment of the relative speed-up/down of changes. When run locally (Macbook Pro M1) or in CI without CodSpeed, this benchmark shows "napi raw" to be about 3x faster than "napi". But under CodSpeed, the results show only a marginal improvement, or in some cases show what's in reality a 3x speed-up as a slight slow-down.

I suspect the overhead CodSpeed introduces (and variance within in) is masking the actual performance gain.

Running benchmarks with CodSpeed DISABLED
┌─────────┬────────────────────────────────────────────────┬─────────┬────────────────────┬──────────┬─────────┐
│ (index) │ Task Name                                      │ ops/sec │ Average Time (ns)  │ Margin   │ Samples │
├─────────┼────────────────────────────────────────────────┼─────────┼────────────────────┼──────────┼─────────┤
│ 0       │ 'parser(napi)[checker.ts]'                     │ '4'     │ 236687380.7000001  │ '±4.51%' │ 10      │
│ 1       │ 'parser(napi raw)[checker.ts]'                 │ '9'     │ 105366553.09999992 │ '±8.60%' │ 10      │
│ 2       │ 'parser(napi)[cal.com.tsx]'                    │ '8'     │ 117207075.80000016 │ '±3.82%' │ 10      │
│ 3       │ 'parser(napi raw)[cal.com.tsx]'                │ '19'    │ 52539795.70000011  │ '±6.14%' │ 10      │
│ 4       │ 'parser(napi)[RadixUIAdoptionSection.jsx]'     │ '4,062' │ 246165.42568895963 │ '±2.25%' │ 2032    │
│ 5       │ 'parser(napi raw)[RadixUIAdoptionSection.jsx]' │ '8,890' │ 112482.59626630507 │ '±7.12%' │ 4446    │
│ 6       │ 'parser(napi)[pdf.mjs]'                        │ '13'    │ 71447427.5999998   │ '±5.49%' │ 10      │
│ 7       │ 'parser(napi raw)[pdf.mjs]'                    │ '30'    │ 32338048.750000097 │ '±5.52%' │ 16      │
│ 8       │ 'parser(napi)[antd.js]'                        │ '1'     │ 522781748.29999983 │ '±2.70%' │ 10      │
│ 9       │ 'parser(napi raw)[antd.js]'                    │ '4'     │ 203665995.90000004 │ '±3.86%' │ 10      │
└─────────┴────────────────────────────────────────────────┴─────────┴────────────────────┴──────────┴─────────┘

Running benchmarks with CodSpeed ENABLED
┌─────────┬────────────────────────────────────────────────┬─────────┬────────────────────┬──────────┬─────────┐
│ (index) │ Task Name                                      │ ops/sec │ Average Time (ns)  │ Margin   │ Samples │
├─────────┼────────────────────────────────────────────────┼─────────┼────────────────────┼──────────┼─────────┤
│ 0       │ 'parser(napi)[checker.ts]'                     │ '0'     │ 1712106154.2999873 │ '±0.56%' │ 10      │
│ 1       │ 'parser(napi raw)[checker.ts]'                 │ '0'     │ 2053894367.2000022 │ '±2.32%' │ 10      │
│ 2       │ 'parser(napi)[cal.com.tsx]'                    │ '0'     │ 1035820193.2000054 │ '±0.40%' │ 10      │
│ 3       │ 'parser(napi raw)[cal.com.tsx]'                │ '1'     │ 980151598.3000107  │ '±3.11%' │ 10      │
│ 4       │ 'parser(napi)[RadixUIAdoptionSection.jsx]'     │ '473'   │ 2112853.6624468486 │ '±9.21%' │ 237     │
│ 5       │ 'parser(napi raw)[RadixUIAdoptionSection.jsx]' │ '583'   │ 1713314.8047957036 │ '±2.26%' │ 292     │
│ 6       │ 'parser(napi)[pdf.mjs]'                        │ '1'     │ 570087846.4000125  │ '±5.78%' │ 10      │
│ 7       │ 'parser(napi raw)[pdf.mjs]'                    │ '1'     │ 709747156.7000029  │ '±3.52%' │ 10      │
│ 8       │ 'parser(napi)[antd.js]'                        │ '0'     │ 3989519725.7999973 │ '±0.86%' │ 10      │
│ 9       │ 'parser(napi raw)[antd.js]'                    │ '0'     │ 4119213271.400001  │ '±0.18%' │ 10      │
└─────────┴────────────────────────────────────────────────┴─────────┴────────────────────┴──────────┴─────────┘

https://github.com/oxc-project/oxc/actions/runs/8308560738

I don't know if this problem is related to the use of native addons within the NodeJS code under test, or a more general problem.

I would be more than happy to help in any way I can to diagnose/fix this. CodSpeed is absolutely brilliant for Rust code, and would very much like to expand our usage to NodeJS code too. Please just let me know what (if anything) I can do.

adriencaccia commented 4 months ago

Hey @overlookmotel,

The slowdown of the whole code execution in the action is to be expected since the whole command is run under valgrind.

Regarding the problem of the relative performance gains/regressions, this might be linked with file system accesses and their handling in our measurement. I will take a look at the execution profiles to try and sort that out.

I will keep you posted here.

overlookmotel commented 4 months ago

Thanks loads for coming back. I did expect there'd be some overhead, but x8 slowdown was more than I expected. I guess it doesn't matter much, as long as it affects everything equally, so can make comparisons.

The relative performance measure is more problematic.

The benchmark where I saw weird results does not involve any file system access.

It's basically comparing 2 methods of transferring data from Rust to JS.

In parseSync ("napi"), Rust outputs a JSON string and JS just calls JSON.parse() on it.
In parseSyncRaw ("napi raw"), Rust outputs a buffer and there's a deserializer written in JS which converts it to JS objects.

So the main differences between the two are:

Passing string vs buffer via napi-rs.
Just a single JSON.parse call vs a large amount of JS execution.

hyf0 commented 3 months ago

Not sure how codspeed is implemented internally. I'm using setTimeout to fake the real costed time produced by benchmark without codspeed plugin. However, codspeed still could not make them right.

codspeed

real benchmark data(ms)


async function sleep(ms: number) {
  await new Promise((resolve) => globalThis.setTimeout(resolve, ms))
}

function main() {
  const realBenchData = JSON.parse(
    nodeFs.readFileSync(
      nodePath.join(PROJECT_ROOT, 'dist/ci-bench-data.json'),
      'utf8',
    ),
  )

  console.log('realBenchData:')
  console.table(realBenchData)

  for (const suite of suitesForCI) {
    const realData = realBenchData[suite.title]
    const realDataSourceMap = realBenchData[`${suite.title}-sourcemap`]
    nodeAssert(realData != null)
    nodeAssert(realDataSourceMap != null)
    bench(suite.title, async () => {
      await sleep(realData.mean)
    })
    bench(`${suite.title}-sourcemap`, async () => {
      await sleep(realDataSourceMap.mean)
    })
  }
}

main()

Not sure if it helps. This is a reproduction branch.

overlookmotel commented 3 months ago

hyf0 I've replied on https://github.com/rolldown/rolldown/pull/706 (which I think is what prompted your comment above).

CodSpeedHQ / action

Instrumentation produces false measurements for NodeJS app using NAPI #96