Open SuperOleg39 opened 2 years ago
@timneutkens, sorry for disturbing, perhaps you have done stress tests on Next.js in combination with react@18
and streaming rendering?
Is there a difference compared to renderToString
, regarding the number of requests per second and the timing of the responses?
Also, I don't want to give the HTML to the client in the stream, this approach has some disadvantages.
Could you elaborate on this?
Could you elaborate on this?
Yes, my team maintain SSR meta-framework, so we tested HTML streaming couple of times, and have some feedback in general (non-React specific):
body
tag, and then starts streaming rendering a new page.
Actual problem for us, because we have a lot of different applications on the same domain.The perf numbers you mentioned. Are those with the polyfilled Writable that buffers that you mentioned above or with native Node.js streams?
It sounds like you’re saying that your Writable polyfill is slow since that’s where the bulk of the time is? Do you want help writing a faster one? Or are you saying that it’s faster with that polyfill than the native Node.js streams?
Even if you want to buffer it, do you actually need it to be converted back to a string or can it stay as binary?
The perf numbers you mentioned. Are those with the polyfilled Writable that buffers that you mentioned above or with native Node.js streams?
I'm using a native stream
module.
Even if you want to buffer it, do you actually need it to be converted back to a string or can it stay as binary?
Buffer.toString
method takes slightly more than 1ms, so I haven't tried to optimize it yet.
Buffer.concat
slows down response more significantly already, seems like a more promising place to optimize.
In both cases, it is not a significant cost compared to the code that calls the flushSubtree
and comleteWriting
methods
Looks like the big part of a problem in large number of Buffer.from
calls in Writable
internals:
Can you share a code example that includes the native stream
module? Ideally the whole test case.
Also, tried a quick test of new Next.js app, production version, react@17
vs react@18
+ experimental streaming, on the similar page component that I tested with my framework, here is results:
react@17
- 20rpsreact@18
+ streaming - 10rpsBut the CPU profile is harder, I can't find obviously slowed down things.
Test command example - npx autocannon -c 50 -d 10 http://localhost:3000/
Can you share a code example that includes the native stream module? Ideally the whole test case.
Sorry, do you have some reference sandbox with renderToPipeableStream
, like this demo, but with latest react
version?
I can make a reproduction on top of such an example.
It's copied from here so you should be able to take this one: https://github.com/facebook/react/tree/main/fixtures/ssr2
It uses a local build but you can replace it with 18 in package.json.
If you run the same benchmark but with your custom HtmlWritable with buffering. What kind of numbers do you get?
Similarly, if you run React 18 with renderToString
.
Just to narrow down how much of the perf issues are with the writable vs the rest.
I don't doubt that native Node.js streams have a lot of overhead in the _write
calls. There's a lot of stuff going on in there. In theory they wouldn't have to be though. It's not optimized for lots of small calls.
Web Streams have even more overhead in every chunk call. So we added our own buffering to avoid enqueueing so often.
We could do the same for Node streams but there's a principle question there. If we do that, we remove optimization opportunities for user space. There's nothing inherent in the Node stream API that it has to have so much overhead. So should it be our responsibility or the responsibility of the recipient stream?
One way could be to build an intermediate stream API, like the one you have, that buffers for a bit and then flushes to the underlying one periodically just like our Web Streams buffering does.
It's copied from here so you should be able to take this one: https://github.com/facebook/react/tree/main/fixtures/ssr2
Thanks! Create few examples from this - https://github.com/SuperOleg39/react-ssr-perf-test
Similarly, if you run React 18 with renderToString.
At first, I made comparsion between renderToString
with 17 and 18 versions (call it react@18 legacy
).
React 17 - 50 RPS
React 18 - 70 RPS, and I only changed a react and react-dom versions.
Just to narrow down how much of the perf issues are with the writable vs the rest.
Then I tested renderToPipeableStream
(call it react@18 stream
)
I got timings around 1.5s for responses, problem was in app.use(compress());
middleware:
After removal this middleware, I got around 2 RPS, 300+ ms CPU work for request. Still a big overhead for stream internals:
If you run the same benchmark but with your custom HtmlWritable with buffering. What kind of numbers do you get?
At last, tried buffering (call it react@18 buffering
), and got around 15 RPS.
CPU profile:
I don't doubt that native Node.js streams have a lot of overhead in the _write calls. There's a lot of stuff going on in there. In theory they wouldn't have to be though. It's not optimized for lots of small calls.
It seems to be(
react@18 legacy
CPU profile, for comparsion (20ms for render, and that's with garbage collector):
So, everywere React rendering are fast, around 15ms for example page component, and the response time grows in proportion to the number of writes to the stream.
Ongoing work in https://github.com/facebook/react/pull/24291.
The initial set of fixes are in 18.1.0-next-df5d32f23-20220411
so you can give that a try.
Here's my results with your benchmark.
┌─────────┬────────┬────────┬────────┬────────┬───────────┬───────┬────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼────────┼────────┼────────┼────────┼───────────┼───────┼────────┤
│ Latency │ 174 ms │ 424 ms │ 578 ms │ 848 ms │ 426.69 ms │ 85 ms │ 861 ms │
└─────────┴────────┴────────┴────────┴────────┴───────────┴───────┴────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬───────┬────────┬─────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼─────────┼─────────┼─────────┼─────────┼───────┼────────┼─────────┤
│ Req/Sec │ 108 │ 108 │ 115 │ 117 │ 114.5 │ 2.34 │ 108 │
├───────────┼─────────┼─────────┼─────────┼─────────┼───────┼────────┼─────────┤
│ Bytes/Sec │ 28.3 MB │ 28.3 MB │ 30.1 MB │ 30.7 MB │ 30 MB │ 612 kB │ 28.3 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴───────┴────────┴─────────┘
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200 │ 1145 │
└──────┴───────┘
┌─────────┬────────┬────────┬────────┬────────┬───────────┬──────────┬────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼────────┼────────┼────────┼────────┼───────────┼──────────┼────────┤
│ Latency │ 129 ms │ 268 ms │ 401 ms │ 537 ms │ 270.63 ms │ 53.38 ms │ 551 ms │
└─────────┴────────┴────────┴────────┴────────┴───────────┴──────────┴────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┬─────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┼─────────┤
│ Req/Sec │ 178 │ 178 │ 182 │ 185 │ 182 │ 2.2 │ 178 │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┼─────────┤
│ Bytes/Sec │ 46.6 MB │ 46.6 MB │ 47.7 MB │ 48.5 MB │ 47.7 MB │ 574 kB │ 46.6 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴────────┴─────────┘
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200 │ 1820 │
└──────┴───────┘
┌─────────┬────────┬────────┬────────┬────────┬───────────┬──────────┬────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼────────┼────────┼────────┼────────┼───────────┼──────────┼────────┤
│ Latency │ 160 ms │ 263 ms │ 376 ms │ 525 ms │ 265.25 ms │ 50.08 ms │ 539 ms │
└─────────┴────────┴────────┴────────┴────────┴───────────┴──────────┴────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┬─────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┼─────────┤
│ Req/Sec │ 182 │ 182 │ 186 │ 188 │ 185.6 │ 1.5 │ 182 │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┼─────────┤
│ Bytes/Sec │ 47.7 MB │ 47.7 MB │ 48.7 MB │ 49.3 MB │ 48.6 MB │ 392 kB │ 47.7 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴────────┴─────────┘
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200 │ 1856 │
┌─────────┬────────┬─────────┬─────────┬─────────┬────────────┬───────────┬─────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼────────┼─────────┼─────────┼─────────┼────────────┼───────────┼─────────┤
│ Latency │ 355 ms │ 2025 ms │ 2507 ms │ 2809 ms │ 1677.05 ms │ 591.21 ms │ 2958 ms │
└─────────┴────────┴─────────┴─────────┴─────────┴────────────┴───────────┴─────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Req/Sec │ 13 │ 13 │ 26 │ 39 │ 25.8 │ 8.48 │ 13 │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Bytes/Sec │ 3.41 MB │ 3.41 MB │ 6.81 MB │ 10.2 MB │ 6.76 MB │ 2.22 MB │ 3.41 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200 │ 258 │
┌─────────┬────────┬────────┬────────┬────────┬──────────┬───────┬─────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼────────┼────────┼────────┼────────┼──────────┼───────┼─────────┤
│ Latency │ 163 ms │ 536 ms │ 620 ms │ 626 ms │ 529.2 ms │ 93 ms │ 1121 ms │
└─────────┴────────┴────────┴────────┴────────┴──────────┴───────┴─────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┬─────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┼─────────┤
│ Req/Sec │ 86 │ 86 │ 92 │ 95 │ 91.9 │ 2.3 │ 86 │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┼─────────┤
│ Bytes/Sec │ 22.5 MB │ 22.5 MB │ 24.1 MB │ 24.9 MB │ 24.1 MB │ 601 kB │ 22.5 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴────────┴─────────┘
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200 │ 919 │
└──────┴───────┘
┌─────────┬─────────┬─────────┬──────────┬──────────┬────────────┬──────────┬──────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼─────────┼─────────┼──────────┼──────────┼────────────┼──────────┼──────────┤
│ Latency │ 7592 ms │ 9956 ms │ 10742 ms │ 10768 ms │ 9794.71 ms │ 857.4 ms │ 10768 ms │
└─────────┴─────────┴─────────┴──────────┴──────────┴────────────┴──────────┴──────────┘
┌───────────┬─────┬──────┬─────┬─────────┬─────────┬─────────┬─────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼─────┼──────┼─────┼─────────┼─────────┼─────────┼─────────┤
│ Req/Sec │ 0 │ 0 │ 0 │ 28 │ 5.56 │ 9.55 │ 5 │
├───────────┼─────┼──────┼─────┼─────────┼─────────┼─────────┼─────────┤
│ Bytes/Sec │ 0 B │ 0 B │ 0 B │ 25.3 MB │ 5.02 MB │ 8.62 MB │ 4.52 MB │
└───────────┴─────┴──────┴─────┴─────────┴─────────┴─────────┴─────────┘
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200 │ 50 │
└──────┴───────┘
┌─────────┬────────┬────────┬────────┬────────┬───────────┬──────────┬────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼────────┼────────┼────────┼────────┼───────────┼──────────┼────────┤
│ Latency │ 200 ms │ 496 ms │ 568 ms │ 989 ms │ 491.61 ms │ 93.45 ms │ 997 ms │
└─────────┴────────┴────────┴────────┴────────┴───────────┴──────────┴────────┘
┌───────────┬─────────┬─────────┬───────┬─────────┬───────┬────────┬─────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼─────────┼─────────┼───────┼─────────┼───────┼────────┼─────────┤
│ Req/Sec │ 98 │ 98 │ 99 │ 100 │ 99 │ 0.9 │ 98 │
├───────────┼─────────┼─────────┼───────┼─────────┼───────┼────────┼─────────┤
│ Bytes/Sec │ 25.8 MB │ 25.8 MB │ 26 MB │ 26.3 MB │ 26 MB │ 242 kB │ 25.8 MB │
└───────────┴─────────┴─────────┴───────┴─────────┴───────┴────────┴─────────┘
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200 │ 990 │
└──────┴───────┘
Impressive work, thanks!
Is there any issues / PR with upcoming related improvements, which I can subscribe for?
I'm not sure if there are any particular ones planned but @gnoff might be able to provide more details. I think it would make sense to track them in this issue as well.
Good idea 👍
If we can reach (in perspective) the same perf for streaming render, without this "penalty", it will be bright future for HTML streaming in general :)
I'm going to continue working on this and will likely have future PRs but can't say exactly when. If you look for titles with [Fizz]
in the title that's likely where they will be. If I remember I'll ping you here or tag you in the future work
Nice, thanks you!
The initial fix is in 18.1.0.
Hello!
When switching from
renderToString
torenderToPipeableStream
, I run load tests on the application, and found a decrease in server throughput, from 50 to 15 RPS, and an increase in response timings. When profiling the CPU, I see a large overhead on the internal work of the stream, specifically the methodsWritable.write
andWritable.uncork
. All these method calls together take more than twice as much CPU time (about 50-60ms) as rendering my test page (about 15-20ms)Also, I don't want to give the HTML to the client in the stream, this approach has some disadvantages. So I have to buffer the data, and it slows down the application a bit more.
CPU profiler in production mode:
CPU profiler in development mode:
My custom Writable stream with buffering:
And rendering flow: