A "watchdog" thread is currently spawned for every JavaScript execution in order to terminate v8 execution if a script runs for too long.
From a CPU standpoint, while theoretically inefficient, this is mostly a non-issue. V8 is single-threaded, and so we only ever spawn watchdog threads sequentially, meaning no contention spawning these threads and minimal context-switching overhad.
However, from a memory standpoint, this is highly inefficient, as we spawn O(N x M) threads, where N is the number of files and M the number of rules. The OS needs to allocate stack space for each thread, so this adds up very quickly (even if we tune the stack space allocated to the OS minimum).
What is your solution?
Spawn a single watchdog thread per JsRuntime, reducing memory allocations from O(N x M) to O(1). A watchdog thread is "bound" to the runtime, and only monitors executions for that specific runtime.
Before
Here's an allocation profile spawning a watchdog thread-per-execution:
All Anonymous VM
# Persistent
# Transient
Total Bytes
16
382,810
346.26 GiB
Note the O(N x M) behavior in the "All Anonymous VM" allocations - these are from spawning watchdog threads
After
And an allocation profile with spawning a single watchdog thread:
I was not expecting the CPU speedup to be so significant -- it's a pleasant surprise.
The test case was as follows:
Analyzing 2075 JavaScript files using 7 rules
Analyzing 66178 TypeScript files using 15 rules
Meaning we did 14525 + 992670 = 1,007,195 executions.
I suspect what is happening here is that v8 execution is so fast (either from genuine ddsa runtime speed improvements, or just from small files and simple rules) that the overhead of spawning a thread and synchronizing with it slows down the execution. The absolute time delta is 39s. With 1 million executions, that suggests the overhead of spawning a thread per-execution is 39 microseconds, which seems entirely plausible to me.
Technical Notes
The implementation of this solution is quite simple because we don't have to worry about concurrency: a JsRuntime exists on a single thread and v8 execution is single-threaded. Because the watchdog only needs to track sequential executions, we can get away with only a condvar and corresponding mutex to handle the bookkeeping for tracking timeouts.
Alternatives considered
What the reviewer should know
The graphs above are an apples-to-apples comparison -- both use the ddsa runtime, except one spawns thread-per-execution instead of per-runtime.
What problem are you trying to solve?
A "watchdog" thread is currently spawned for every JavaScript execution in order to terminate v8 execution if a script runs for too long.
From a CPU standpoint, while theoretically inefficient, this is mostly a non-issue. V8 is single-threaded, and so we only ever spawn watchdog threads sequentially, meaning no contention spawning these threads and minimal context-switching overhad.
However, from a memory standpoint, this is highly inefficient, as we spawn
O(N x M)
threads, whereN
is the number of files andM
the number of rules. The OS needs to allocate stack space for each thread, so this adds up very quickly (even if we tune the stack space allocated to the OS minimum).What is your solution?
Spawn a single watchdog thread per
JsRuntime
, reducing memory allocations fromO(N x M)
toO(1)
. A watchdog thread is "bound" to the runtime, and only monitors executions for that specific runtime.Before
Here's an allocation profile spawning a watchdog thread-per-execution:
Note the O(N x M) behavior in the "All Anonymous VM" allocations - these are from spawning watchdog threads
After
And an allocation profile with spawning a single watchdog thread:
Impact: 238s -> 199s (~15% wall clock speedup), 99%+ allocation reduction
I was not expecting the CPU speedup to be so significant -- it's a pleasant surprise.
The test case was as follows:
Meaning we did 14525 + 992670 = 1,007,195 executions.
I suspect what is happening here is that v8 execution is so fast (either from genuine ddsa runtime speed improvements, or just from small files and simple rules) that the overhead of spawning a thread and synchronizing with it slows down the execution. The absolute time delta is 39s. With 1 million executions, that suggests the overhead of spawning a thread per-execution is 39 microseconds, which seems entirely plausible to me.
Technical Notes
The implementation of this solution is quite simple because we don't have to worry about concurrency: a
JsRuntime
exists on a single thread and v8 execution is single-threaded. Because the watchdog only needs to track sequential executions, we can get away with only a condvar and corresponding mutex to handle the bookkeeping for tracking timeouts.Alternatives considered
What the reviewer should know