facebook / react

The library for web and native user interfaces.
https://react.dev
MIT License
230.3k stars 47.63k forks source link

Server-side rendering performance degradation with renderToPipeableStream #24232

Open SuperOleg39 opened 2 years ago

SuperOleg39 commented 2 years ago

Hello!

When switching from renderToString to renderToPipeableStream, I run load tests on the application, and found a decrease in server throughput, from 50 to 15 RPS, and an increase in response timings. When profiling the CPU, I see a large overhead on the internal work of the stream, specifically the methods Writable.write and Writable.uncork. All these method calls together take more than twice as much CPU time (about 50-60ms) as rendering my test page (about 15-20ms)

Also, I don't want to give the HTML to the client in the stream, this approach has some disadvantages. So I have to buffer the data, and it slows down the application a bit more.

CPU profiler in production mode:

CPU profiler in development mode:

My custom Writable stream with buffering:

class HtmlWritable extends Writable {
  chunks = [];
  html = '';

  getHtml() {
    return this.html;
  }

  _write(chunk, encoding, callback) {
    this.chunks.push(chunk);
    callback();
  }

  _final(callback) {
    this.html = Buffer.concat(this.chunks).toString();
    callback();
  }
}

And rendering flow:

import { renderToPipeableStream } from 'react-dom/server';

new Promise((resolve, reject) => {
  const htmlWritable = new HtmlWritable();

  const { pipe, abort } = renderToPipeableStream(renderResult, {
    onAllReady() {
      pipe(htmlWritable);
    },
    onError(error) {
      reject(error);
    },
  });

  htmlWritable.on('finish', () => {
    resolve(htmlWritable.getHtml());
  });
});
SuperOleg39 commented 2 years ago

@timneutkens, sorry for disturbing, perhaps you have done stress tests on Next.js in combination with react@18 and streaming rendering? Is there a difference compared to renderToString, regarding the number of requests per second and the timing of the responses?

bvaughn commented 2 years ago

Also, I don't want to give the HTML to the client in the stream, this approach has some disadvantages.

Could you elaborate on this?

SuperOleg39 commented 2 years ago

Could you elaborate on this?

Yes, my team maintain SSR meta-framework, so we tested HTML streaming couple of times, and have some feedback in general (non-React specific):

sebmarkbage commented 2 years ago

The perf numbers you mentioned. Are those with the polyfilled Writable that buffers that you mentioned above or with native Node.js streams?

It sounds like you’re saying that your Writable polyfill is slow since that’s where the bulk of the time is? Do you want help writing a faster one? Or are you saying that it’s faster with that polyfill than the native Node.js streams?

Even if you want to buffer it, do you actually need it to be converted back to a string or can it stay as binary?

SuperOleg39 commented 2 years ago

The perf numbers you mentioned. Are those with the polyfilled Writable that buffers that you mentioned above or with native Node.js streams?

I'm using a native stream module.

Even if you want to buffer it, do you actually need it to be converted back to a string or can it stay as binary?

Buffer.toString method takes slightly more than 1ms, so I haven't tried to optimize it yet. Buffer.concat slows down response more significantly already, seems like a more promising place to optimize. In both cases, it is not a significant cost compared to the code that calls the flushSubtree and comleteWriting methods

SuperOleg39 commented 2 years ago

Looks like the big part of a problem in large number of Buffer.from calls in Writable internals:

Снимок экрана 2022-03-31 в 19 12 25
sebmarkbage commented 2 years ago

Can you share a code example that includes the native stream module? Ideally the whole test case.

SuperOleg39 commented 2 years ago

Also, tried a quick test of new Next.js app, production version, react@17 vs react@18 + experimental streaming, on the similar page component that I tested with my framework, here is results:

But the CPU profile is harder, I can't find obviously slowed down things.

Test command example - npx autocannon -c 50 -d 10 http://localhost:3000/

SuperOleg39 commented 2 years ago

Can you share a code example that includes the native stream module? Ideally the whole test case.

Sorry, do you have some reference sandbox with renderToPipeableStream, like this demo, but with latest react version? I can make a reproduction on top of such an example.

gaearon commented 2 years ago

It's copied from here so you should be able to take this one: https://github.com/facebook/react/tree/main/fixtures/ssr2

It uses a local build but you can replace it with 18 in package.json.

sebmarkbage commented 2 years ago

If you run the same benchmark but with your custom HtmlWritable with buffering. What kind of numbers do you get?

Similarly, if you run React 18 with renderToString.

Just to narrow down how much of the perf issues are with the writable vs the rest.

I don't doubt that native Node.js streams have a lot of overhead in the _write calls. There's a lot of stuff going on in there. In theory they wouldn't have to be though. It's not optimized for lots of small calls.

Web Streams have even more overhead in every chunk call. So we added our own buffering to avoid enqueueing so often.

We could do the same for Node streams but there's a principle question there. If we do that, we remove optimization opportunities for user space. There's nothing inherent in the Node stream API that it has to have so much overhead. So should it be our responsibility or the responsibility of the recipient stream?

One way could be to build an intermediate stream API, like the one you have, that buffers for a bit and then flushes to the underlying one periodically just like our Web Streams buffering does.

SuperOleg39 commented 2 years ago

It's copied from here so you should be able to take this one: https://github.com/facebook/react/tree/main/fixtures/ssr2

Thanks! Create few examples from this - https://github.com/SuperOleg39/react-ssr-perf-test

Similarly, if you run React 18 with renderToString.

At first, I made comparsion between renderToString with 17 and 18 versions (call it react@18 legacy). React 17 - 50 RPS React 18 - 70 RPS, and I only changed a react and react-dom versions.

Just to narrow down how much of the perf issues are with the writable vs the rest.

Then I tested renderToPipeableStream (call it react@18 stream) I got timings around 1.5s for responses, problem was in app.use(compress()); middleware:

Снимок экрана 2022-03-31 в 23 36 17

After removal this middleware, I got around 2 RPS, 300+ ms CPU work for request. Still a big overhead for stream internals:

Снимок экрана 2022-03-31 в 23 38 30

If you run the same benchmark but with your custom HtmlWritable with buffering. What kind of numbers do you get?

At last, tried buffering (call it react@18 buffering), and got around 15 RPS. CPU profile:

Снимок экрана 2022-03-31 в 23 40 47
SuperOleg39 commented 2 years ago

I don't doubt that native Node.js streams have a lot of overhead in the _write calls. There's a lot of stuff going on in there. In theory they wouldn't have to be though. It's not optimized for lots of small calls.

It seems to be(

SuperOleg39 commented 2 years ago

react@18 legacy CPU profile, for comparsion (20ms for render, and that's with garbage collector):

Снимок экрана 2022-03-31 в 23 46 37

So, everywere React rendering are fast, around 15ms for example page component, and the response time grows in proportion to the number of writes to the stream.

gaearon commented 2 years ago

Ongoing work in https://github.com/facebook/react/pull/24291.

gaearon commented 2 years ago

The initial set of fixes are in 18.1.0-next-df5d32f23-20220411 so you can give that a try. Here's my results with your benchmark.

17

┌─────────┬────────┬────────┬────────┬────────┬───────────┬───────┬────────┐
│ Stat    │ 2.5%   │ 50%    │ 97.5%  │ 99%    │ Avg       │ Stdev │ Max    │
├─────────┼────────┼────────┼────────┼────────┼───────────┼───────┼────────┤
│ Latency │ 174 ms │ 424 ms │ 578 ms │ 848 ms │ 426.69 ms │ 85 ms │ 861 ms │
└─────────┴────────┴────────┴────────┴────────┴───────────┴───────┴────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬───────┬────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg   │ Stdev  │ Min     │
├───────────┼─────────┼─────────┼─────────┼─────────┼───────┼────────┼─────────┤
│ Req/Sec   │ 108     │ 108     │ 115     │ 117     │ 114.5 │ 2.34   │ 108     │
├───────────┼─────────┼─────────┼─────────┼─────────┼───────┼────────┼─────────┤
│ Bytes/Sec │ 28.3 MB │ 28.3 MB │ 30.1 MB │ 30.7 MB │ 30 MB │ 612 kB │ 28.3 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴───────┴────────┴─────────┘
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200  │ 1145  │
└──────┴───────┘

18+legacy

Before

┌─────────┬────────┬────────┬────────┬────────┬───────────┬──────────┬────────┐
│ Stat    │ 2.5%   │ 50%    │ 97.5%  │ 99%    │ Avg       │ Stdev    │ Max    │
├─────────┼────────┼────────┼────────┼────────┼───────────┼──────────┼────────┤
│ Latency │ 129 ms │ 268 ms │ 401 ms │ 537 ms │ 270.63 ms │ 53.38 ms │ 551 ms │
└─────────┴────────┴────────┴────────┴────────┴───────────┴──────────┴────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg     │ Stdev  │ Min     │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┼─────────┤
│ Req/Sec   │ 178     │ 178     │ 182     │ 185     │ 182     │ 2.2    │ 178     │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┼─────────┤
│ Bytes/Sec │ 46.6 MB │ 46.6 MB │ 47.7 MB │ 48.5 MB │ 47.7 MB │ 574 kB │ 46.6 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴────────┴─────────┘
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200  │ 1820  │
└──────┴───────┘

After

┌─────────┬────────┬────────┬────────┬────────┬───────────┬──────────┬────────┐
│ Stat    │ 2.5%   │ 50%    │ 97.5%  │ 99%    │ Avg       │ Stdev    │ Max    │
├─────────┼────────┼────────┼────────┼────────┼───────────┼──────────┼────────┤
│ Latency │ 160 ms │ 263 ms │ 376 ms │ 525 ms │ 265.25 ms │ 50.08 ms │ 539 ms │
└─────────┴────────┴────────┴────────┴────────┴───────────┴──────────┴────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg     │ Stdev  │ Min     │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┼─────────┤
│ Req/Sec   │ 182     │ 182     │ 186     │ 188     │ 185.6   │ 1.5    │ 182     │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┼─────────┤
│ Bytes/Sec │ 47.7 MB │ 47.7 MB │ 48.7 MB │ 49.3 MB │ 48.6 MB │ 392 kB │ 47.7 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴────────┴─────────┘
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200  │ 1856  │

18+buffering

Before

┌─────────┬────────┬─────────┬─────────┬─────────┬────────────┬───────────┬─────────┐
│ Stat    │ 2.5%   │ 50%     │ 97.5%   │ 99%     │ Avg        │ Stdev     │ Max     │
├─────────┼────────┼─────────┼─────────┼─────────┼────────────┼───────────┼─────────┤
│ Latency │ 355 ms │ 2025 ms │ 2507 ms │ 2809 ms │ 1677.05 ms │ 591.21 ms │ 2958 ms │
└─────────┴────────┴─────────┴─────────┴─────────┴────────────┴───────────┴─────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg     │ Stdev   │ Min     │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Req/Sec   │ 13      │ 13      │ 26      │ 39      │ 25.8    │ 8.48    │ 13      │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Bytes/Sec │ 3.41 MB │ 3.41 MB │ 6.81 MB │ 10.2 MB │ 6.76 MB │ 2.22 MB │ 3.41 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200  │ 258   │

After

┌─────────┬────────┬────────┬────────┬────────┬──────────┬───────┬─────────┐
│ Stat    │ 2.5%   │ 50%    │ 97.5%  │ 99%    │ Avg      │ Stdev │ Max     │
├─────────┼────────┼────────┼────────┼────────┼──────────┼───────┼─────────┤
│ Latency │ 163 ms │ 536 ms │ 620 ms │ 626 ms │ 529.2 ms │ 93 ms │ 1121 ms │
└─────────┴────────┴────────┴────────┴────────┴──────────┴───────┴─────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg     │ Stdev  │ Min     │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┼─────────┤
│ Req/Sec   │ 86      │ 86      │ 92      │ 95      │ 91.9    │ 2.3    │ 86      │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼────────┼─────────┤
│ Bytes/Sec │ 22.5 MB │ 22.5 MB │ 24.1 MB │ 24.9 MB │ 24.1 MB │ 601 kB │ 22.5 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴────────┴─────────┘
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200  │ 919   │
└──────┴───────┘

18+stream

Before

┌─────────┬─────────┬─────────┬──────────┬──────────┬────────────┬──────────┬──────────┐
│ Stat    │ 2.5%    │ 50%     │ 97.5%    │ 99%      │ Avg        │ Stdev    │ Max      │
├─────────┼─────────┼─────────┼──────────┼──────────┼────────────┼──────────┼──────────┤
│ Latency │ 7592 ms │ 9956 ms │ 10742 ms │ 10768 ms │ 9794.71 ms │ 857.4 ms │ 10768 ms │
└─────────┴─────────┴─────────┴──────────┴──────────┴────────────┴──────────┴──────────┘
┌───────────┬─────┬──────┬─────┬─────────┬─────────┬─────────┬─────────┐
│ Stat      │ 1%  │ 2.5% │ 50% │ 97.5%   │ Avg     │ Stdev   │ Min     │
├───────────┼─────┼──────┼─────┼─────────┼─────────┼─────────┼─────────┤
│ Req/Sec   │ 0   │ 0    │ 0   │ 28      │ 5.56    │ 9.55    │ 5       │
├───────────┼─────┼──────┼─────┼─────────┼─────────┼─────────┼─────────┤
│ Bytes/Sec │ 0 B │ 0 B  │ 0 B │ 25.3 MB │ 5.02 MB │ 8.62 MB │ 4.52 MB │
└───────────┴─────┴──────┴─────┴─────────┴─────────┴─────────┴─────────┘
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200  │ 50    │
└──────┴───────┘

After

┌─────────┬────────┬────────┬────────┬────────┬───────────┬──────────┬────────┐
│ Stat    │ 2.5%   │ 50%    │ 97.5%  │ 99%    │ Avg       │ Stdev    │ Max    │
├─────────┼────────┼────────┼────────┼────────┼───────────┼──────────┼────────┤
│ Latency │ 200 ms │ 496 ms │ 568 ms │ 989 ms │ 491.61 ms │ 93.45 ms │ 997 ms │
└─────────┴────────┴────────┴────────┴────────┴───────────┴──────────┴────────┘
┌───────────┬─────────┬─────────┬───────┬─────────┬───────┬────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%   │ 97.5%   │ Avg   │ Stdev  │ Min     │
├───────────┼─────────┼─────────┼───────┼─────────┼───────┼────────┼─────────┤
│ Req/Sec   │ 98      │ 98      │ 99    │ 100     │ 99    │ 0.9    │ 98      │
├───────────┼─────────┼─────────┼───────┼─────────┼───────┼────────┼─────────┤
│ Bytes/Sec │ 25.8 MB │ 25.8 MB │ 26 MB │ 26.3 MB │ 26 MB │ 242 kB │ 25.8 MB │
└───────────┴─────────┴─────────┴───────┴─────────┴───────┴────────┴─────────┘
┌──────┬───────┐
│ Code │ Count │
├──────┼───────┤
│ 200  │ 990   │
└──────┴───────┘
SuperOleg39 commented 2 years ago

Impressive work, thanks!

Is there any issues / PR with upcoming related improvements, which I can subscribe for?

gaearon commented 2 years ago

I'm not sure if there are any particular ones planned but @gnoff might be able to provide more details. I think it would make sense to track them in this issue as well.

SuperOleg39 commented 2 years ago

Good idea 👍

If we can reach (in perspective) the same perf for streaming render, without this "penalty", it will be bright future for HTML streaming in general :)

gnoff commented 2 years ago

I'm going to continue working on this and will likely have future PRs but can't say exactly when. If you look for titles with [Fizz] in the title that's likely where they will be. If I remember I'll ping you here or tag you in the future work

SuperOleg39 commented 2 years ago

Nice, thanks you!

gaearon commented 2 years ago

The initial fix is in 18.1.0.