airbnb / hypernova

A service for server-side rendering your JavaScript views
MIT License
5.82k stars 207 forks source link

Execute jobs in a batch in serial #107

Closed schleyfox closed 6 years ago

schleyfox commented 6 years ago

Previously, we dispatched all jobs in a batch concurrently. Rendering jobs are compute bound, so we would not actually get parallel execution. One effect of this is that timers for a job in plugins record a start time from the point where all jobs are dispatched and record the end time after all or most of the jobs have completely finished. This prevents us from being able to reliably tell how much latency a single job contributes to the total latency (for instance, if job A takes 500ms, and job B and C each take 25ms, we may get results that would indicate that jobs A, B, and C each spent 550ms executing, which is true in a way, but unhelpful).

This change executes the jobs serially as a chain.

This should have no or minimal impact either way on actual response time.

This is intended as an experiment, to see whether the above claims are true.

/cc @goatslacker @ljharb

ljharb commented 6 years ago

Would it be simpler to record individual job start and end times, rather than a potentially tricky refactor?

schleyfox commented 6 years ago

@ljharb not really, since that only works if we add that recording right around a section that is absolutely guaranteed not to yield back to the event loop. This would then require plugins to depend on the implementation and exposure of the timers, rather than just working as one would expect.

I think this mode of execution is generally a better idea, since the jobs block each other on compute. Executing serially potentially allows garbage to be reclaimed that would otherwise still be referenced in interleaved execution, for instance. I think making it configurable to ensure absolute compatibility is a fair approach (and would allow for cases where plugin hooks end up actually doing IO that could be better handled by concurrent execution).

goatslacker commented 6 years ago

I'm thinking about cases where rendering is done in an asynchronous manner. For React (or templating languages like Jade) this approach works 👍 but Hypernova works with any renderer -- potentially async ones or renderers that stream. Wouldn't it be better to run those in parallel?

I'm fine with switching this to be serial by default since it fits the majority of the use cases.

schleyfox commented 6 years ago

@goatslacker that makes sense, though async renderers will have the same issue if rendering is compute bound. We're not going to get parallelism on a single CPU core.

Ultimately, if we want to keep pushing on this, I think offering an execution mode where each job is executed via worker pool of separate processes (thus turning render from the perspective of the dispatcher into just async-IO) makes a lot of sense. This would be a more involved change, of course.

schleyfox commented 6 years ago

@goatslacker I've made it configurable, updated tests to test both, documented it. I think this makes it semver-minor again, as default behavior should be identical to before.

goatslacker commented 6 years ago

where each job is executed via worker pool of separate processes

I think we tried that at one point but passing JSON between the coordinator and the other processes was a tight bottleneck.

goatslacker commented 6 years ago

It's ok if it's breaking. I think switching it to serial by default makes more sense.

ljharb commented 6 years ago

It’d be better to ship the minor first, and then flip the default in a major after.

schleyfox commented 6 years ago

Verified this on our internal test environment. I'm going to proceed with the minor version bump.