DataDog / datadog-static-analyzer

Datadog Static Analyzer
https://docs.datadoghq.com/static_analysis/
Apache License 2.0
100 stars 12 forks source link

[STAL-1960] Optimize memory usage of JS timeout functionality #412

Closed jasonforal closed 3 months ago

jasonforal commented 3 months ago

What problem are you trying to solve?

A "watchdog" thread is currently spawned for every JavaScript execution in order to terminate v8 execution if a script runs for too long.

From a CPU standpoint, while theoretically inefficient, this is mostly a non-issue. V8 is single-threaded, and so we only ever spawn watchdog threads sequentially, meaning no contention spawning these threads and minimal context-switching overhad.

However, from a memory standpoint, this is highly inefficient, as we spawn O(N x M) threads, where N is the number of files and M the number of rules. The OS needs to allocate stack space for each thread, so this adds up very quickly (even if we tune the stack space allocated to the OS minimum).

What is your solution?

Spawn a single watchdog thread per JsRuntime, reducing memory allocations from O(N x M) to O(1). A watchdog thread is "bound" to the runtime, and only monitors executions for that specific runtime.

Before

Here's an allocation profile spawning a watchdog thread-per-execution:

multiple-watchdog
All Anonymous VM # Persistent # Transient Total Bytes
16 382,810 346.26 GiB

Note the O(N x M) behavior in the "All Anonymous VM" allocations - these are from spawning watchdog threads

spawn-watchdog-thread

After

And an allocation profile with spawning a single watchdog thread:

single-watchdog
All Anonymous VM # Persistent # Transient Total Bytes
15 89 96.73 MiB

Impact: 238s -> 199s (~15% wall clock speedup), 99%+ allocation reduction

I was not expecting the CPU speedup to be so significant -- it's a pleasant surprise.

The test case was as follows:

Analyzing 2075 JavaScript files using 7 rules
Analyzing 66178 TypeScript files using 15 rules

Meaning we did 14525 + 992670 = 1,007,195 executions.

I suspect what is happening here is that v8 execution is so fast (either from genuine ddsa runtime speed improvements, or just from small files and simple rules) that the overhead of spawning a thread and synchronizing with it slows down the execution. The absolute time delta is 39s. With 1 million executions, that suggests the overhead of spawning a thread per-execution is 39 microseconds, which seems entirely plausible to me.

Technical Notes

The implementation of this solution is quite simple because we don't have to worry about concurrency: a JsRuntime exists on a single thread and v8 execution is single-threaded. Because the watchdog only needs to track sequential executions, we can get away with only a condvar and corresponding mutex to handle the bookkeeping for tracking timeouts.

Alternatives considered

What the reviewer should know

jasonforal commented 3 months ago

(EDIT): Updating based on offline conversation -- we will be implementing both runtimes, with the new one temporarily behind a feature flag