fastify / point-of-view

Template rendering plugin for Fastify
MIT License
335 stars 86 forks source link

refactor(index): async flow control #405

Closed mweberxyz closed 5 months ago

mweberxyz commented 5 months ago

The goal of this commit is to modernize flow control throughout the plugin, converting callback-driven architecture to async-await.

As much as possible, the original flow of the code was kept intact, though a fair amount of error handling was removed, and the errors simply get thrown up to the renderer callers.

Some duplicate logic was factored up, namely all calls to this.header and this.send have been factored up into the reply decorator itself and out of the individual renderers.

Edit: Additional commits factored up minification and page not defined checks out of the individual renderers as well.

Big thanks to @multivoltage for earlier work.

Benchmarks

master on my machine (Node 16 on M1)

% autocannon -c 200 -d 30 -p 20 localhost:3000
Running 30s test @ http://localhost:3000
200 connections with 20 pipelining factor

┌─────────┬───────┬───────┬───────┬───────┬──────────┬──────────┬────────┐
│ Stat    │ 2.5%  │ 50%   │ 97.5% │ 99%   │ Avg      │ Stdev    │ Max    │
├─────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼────────┤
│ Latency │ 24 ms │ 42 ms │ 73 ms │ 74 ms │ 42.19 ms │ 16.59 ms │ 327 ms │
└─────────┴───────┴───────┴───────┴───────┴──────────┴──────────┴────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬──────────┬──────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg      │ Stdev    │ Min     │
├───────────┼─────────┼─────────┼─────────┼─────────┼──────────┼──────────┼─────────┤
│ Req/Sec   │ 71,359  │ 71,359  │ 94,335  │ 95,551  │ 93,657.6 │ 4,173.29 │ 71,310  │
├───────────┼─────────┼─────────┼─────────┼─────────┼──────────┼──────────┼─────────┤
│ Bytes/Sec │ 19.1 MB │ 19.1 MB │ 25.3 MB │ 25.6 MB │ 25.1 MB  │ 1.12 MB  │ 19.1 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴──────────┴──────────┴─────────┘

and here's the result of this branch:

% autocannon -c 200 -d 30 -p 20 localhost:3000
Running 30s test @ http://localhost:3000
200 connections with 20 pipelining factor

┌─────────┬───────┬───────┬───────┬───────┬──────────┬──────────┬────────┐
│ Stat    │ 2.5%  │ 50%   │ 97.5% │ 99%   │ Avg      │ Stdev    │ Max    │
├─────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼────────┤
│ Latency │ 24 ms │ 44 ms │ 76 ms │ 78 ms │ 43.06 ms │ 17.99 ms │ 345 ms │
└─────────┴───────┴───────┴───────┴───────┴──────────┴──────────┴────────┘
┌───────────┬────────┬────────┬─────────┬─────────┬───────────┬──────────┬─────────┐
│ Stat      │ 1%     │ 2.5%   │ 50%     │ 97.5%   │ Avg       │ Stdev    │ Min     │
├───────────┼────────┼────────┼─────────┼─────────┼───────────┼──────────┼─────────┤
│ Req/Sec   │ 63,231 │ 63,231 │ 92,415  │ 94,271  │ 91,791.47 │ 5,370.73 │ 63,224  │
├───────────┼────────┼────────┼─────────┼─────────┼───────────┼──────────┼─────────┤
│ Bytes/Sec │ 17 MB  │ 17 MB  │ 24.8 MB │ 25.3 MB │ 24.6 MB   │ 1.44 MB  │ 16.9 MB │
└───────────┴────────┴────────┴─────────┴─────────┴───────────┴──────────┴─────────┘

So we potentially have a slight regression at this point, but it seems to be within run-to-run variance. Working further from here, I've managed to get up into the 100k req/s averages with additional refactoring, but wanted to stop here and get this PR up before going any further, so the (hopeful) increase in code maintainability can be weighed versus the slight hit to performance.

Here's an express.js benchmark on my machine just for comparisons sake:

autocannon -c 200 -d 30 -p 20 localhost:3000
Running 30s test @ http://localhost:3000
200 connections with 20 pipelining factor

┌─────────┬───────┬────────┬────────┬────────┬───────────┬───────────┬─────────┐
│ Stat    │ 2.5%  │ 50%    │ 97.5%  │ 99%    │ Avg       │ Stdev     │ Max     │
├─────────┼───────┼────────┼────────┼────────┼───────────┼───────────┼─────────┤
│ Latency │ 88 ms │ 173 ms │ 183 ms │ 192 ms │ 173.97 ms │ 292.88 ms │ 9987 ms │
└─────────┴───────┴────────┴────────┴────────┴───────────┴───────────┴─────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬───────────┬────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg       │ Stdev  │ Min     │
├───────────┼─────────┼─────────┼─────────┼─────────┼───────────┼────────┼─────────┤
│ Req/Sec   │ 19,007  │ 19,007  │ 22,175  │ 22,415  │ 22,064.54 │ 583.95 │ 19,004  │
├───────────┼─────────┼─────────┼─────────┼─────────┼───────────┼────────┼─────────┤
│ Bytes/Sec │ 6.31 MB │ 6.31 MB │ 7.36 MB │ 7.44 MB │ 7.33 MB   │ 194 kB │ 6.31 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴───────────┴────────┴─────────┘

Checklist

Closes #404

multivoltage commented 5 months ago

It's impressive hom much change autocannon -c 200 -d 30 -p 20 localhost:3000 from my MAC (intel 2019) :). M1 seems to be 4x :)

│ Stat    │ 2.5%   │ 50%    │ 97.5%  │ 99%    │ Avg       │ Stdev    │ Max    │
├─────────┼────────┼────────┼────────┼────────┼───────────┼──────────┼────────┤
│ Latency │ 334 ms │ 350 ms │ 395 ms │ 502 ms │ 354.97 ms │ 30.05 ms │ 730 ms │

I'm curious What about const readFile = require("node:util").promisify(require("node:fs").readFile); I tried on your branch and seems to improve perf instead const { readFile } = require('node:fs/promises')

mweberxyz commented 5 months ago

@multivoltage This is a MacBook Air (M1, 2020) - and it doesn't even have a cooling fan! The M-series IO is insane.

Regarding the readFile implementation, my initial intuition is that it should not matter in the slightest. In the benchmark, we are running in production-cached mode, so you would expect there to only be a single call to readFile anyways.

Looking further into it is very interesting though - because autocannon hits it with so much load so fast, there are thousands of requests andreadFile calls, before any one request successfully caches the renderer.

Take a look at the results when the benchmark command curl localhost:3000 > /dev/null && autocannon -c 200 -d 5 -p 20 localhost:3000 is used:

require('node:fs/promises') implementation:

┌─────────┬───────┬───────┬───────┬───────┬──────────┬──────────┬─────────┐
│ Stat    │ 2.5%  │ 50%   │ 97.5% │ 99%   │ Avg      │ Stdev    │ Max     │
├─────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼─────────┤
│ Latency │ 15 ms │ 27 ms │ 58 ms │ 70 ms │ 37.89 ms │ 75.16 ms │ 2181 ms │
└─────────┴───────┴───────┴───────┴───────┴──────────┴──────────┴─────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬───────────┬──────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg       │ Stdev    │ Min     │
├───────────┼─────────┼─────────┼─────────┼─────────┼───────────┼──────────┼─────────┤
│ Req/Sec   │ 95,871  │ 95,871  │ 105,983 │ 106,175 │ 103,980.8 │ 4,071.41 │ 95,827  │
├───────────┼─────────┼─────────┼─────────┼─────────┼───────────┼──────────┼─────────┤
│ Bytes/Sec │ 25.7 MB │ 25.7 MB │ 28.4 MB │ 28.4 MB │ 27.9 MB   │ 1.09 MB  │ 25.7 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴───────────┴──────────┴─────────┘

const readFile = require("node:util").promisify... implementation

┌─────────┬───────┬───────┬───────┬───────┬───────┬──────────┬─────────┐
│ Stat    │ 2.5%  │ 50%   │ 97.5% │ 99%   │ Avg   │ Stdev    │ Max     │
├─────────┼───────┼───────┼───────┼───────┼───────┼──────────┼─────────┤
│ Latency │ 17 ms │ 27 ms │ 60 ms │ 70 ms │ 38 ms │ 62.66 ms │ 1951 ms │
└─────────┴───────┴───────┴───────┴───────┴───────┴──────────┴─────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬───────────┬──────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg       │ Stdev    │ Min     │
├───────────┼─────────┼─────────┼─────────┼─────────┼───────────┼──────────┼─────────┤
│ Req/Sec   │ 95,231  │ 95,231  │ 105,791 │ 105,855 │ 103,622.4 │ 4,212.86 │ 95,229  │
├───────────┼─────────┼─────────┼─────────┼─────────┼───────────┼──────────┼─────────┤
│ Bytes/Sec │ 25.5 MB │ 25.5 MB │ 28.4 MB │ 28.4 MB │ 27.8 MB   │ 1.13 MB  │ 25.5 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴───────────┴──────────┴─────────┘

This gives me the idea of pre-caching the files in templatesDir on load, similar to the way preProcessDot does some work before fastify ready -- will explore after this PR gets some traction.

mweberxyz commented 5 months ago

More benchmarks with https://github.com/fastify/point-of-view/pull/407 merged in:

master:

express.js: 24221.34 req/s
fastify.js: 100074.67 req/s
fastify-twig.js: 87040 req/s
fastify-ejs-async.js: 103680 req/s
fastify-ejs-with-layout.js: 91349.34 req/s

refactor/async:

express.js: 24040 req/s
fastify.js: 99754.67 req/s
fastify-twig.js: 89397.34 req/s
fastify-ejs-async.js: 101664 req/s
fastify-ejs-with-layout.js: 83424 req/s
mweberxyz commented 5 months ago

Just pushed some light refactoring made possible by the async flow control changes - for some big performance gains!

Benchmarks

Node 16

master:

express.js: 24096 req/s
fastify.js: 97024 req/s
fastify-twig.js: 87061.34 req/s
fastify-ejs-async.js: 104170.67 req/s
fastify-ejs-with-layout.js: 90229.34 req/s

refactor/async as of 050045e:

express.js: 24222.4 req/s
fastify.js: 131557.34 req/s
fastify-twig.js: 109024 req/s
fastify-ejs-async.js: 133402.67 req/s
fastify-ejs-with-layout.js: 116405.34 req/s

That's +35% for the existing benchmark :)

Node 20

master:

express.js: 20168 req/s
fastify.js: 90933.34 req/s
fastify-twig.js: 80192 req/s
fastify-ejs-async.js: 97877.34 req/s
fastify-ejs-with-layout.js: 84138.67 req/s

refactor/async as of 050045e:

express.js: 20210.67 req/s
fastify.js: 113141.34 req/s
fastify-twig.js: 96949.34 req/s
fastify-ejs-async.js: 113120 req/s
fastify-ejs-with-layout.js: 101397.34 req/s
gurgunday commented 5 months ago

This looks pretty good to me!

mcollina commented 5 months ago

@Uzlopak ptal

mweberxyz commented 5 months ago

Pushed a regression test as this refactoring incidentally solves #404

For reference, if the added test is added to master the result of the test run:

master
Uzlopak commented 5 months ago

I tested it. Basically what slows down is that the template files are loaded concurrently. I think in the other PR it was explained already.

I would actually say we need a semaphore or whatever it is called. If I implement the following code, we only load the file only once. But the test fastify.view.clearCache clears cache fails. So maybe I messed something up.

From 69216d40cf620a39bfaa4813f941ce5c8ec14025 Mon Sep 17 00:00:00 2001
From: uzlopak <aras.abbasi@googlemail.com>
Date: Mon, 19 Feb 2024 14:36:01 +0100
Subject: [PATCH] remove bottleneck

---
 index.js | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/index.js b/index.js
index 73867be..82d97a9 100644
--- a/index.js
+++ b/index.js
@@ -1,7 +1,6 @@
 'use strict'
-const { readFile } = require('node:fs/promises')
 const fp = require('fastify-plugin')
-const { accessSync, existsSync, mkdirSync, readdirSync } = require('node:fs')
+const { accessSync, existsSync, mkdirSync, readdirSync, readFileSync } = require('node:fs')
 const { basename, dirname, extname, join, resolve } = require('node:path')
 const HLRU = require('hashlru')
 const supportedEngines = ['ejs', 'nunjucks', 'pug', 'handlebars', 'mustache', 'art-template', 'twig', 'liquid', 'dot', 'eta']
@@ -628,6 +627,26 @@ async function fastifyView (fastify, opts) {
   }
 }

+/**
+ * @type {Map<string, Promise>}
+ */
+const readFileMap = new Map()
+
+function readFile (filePath, encoding) {
+  if (!readFileMap.has(filePath)) {
+    readFileMap.set(filePath, new Promise((resolve, reject) => {
+      try {
+        resolve(readFileSync(filePath, encoding))
+      } catch (e) {
+        reject(e)
+      } finally {
+        readFileMap.delete(filePath)
+      }
+    }))
+  }
+  return readFileMap.get(filePath)
+}
+
 module.exports = fp(fastifyView, {
   fastify: '4.x',
   name: '@fastify/view'
-- 
2.34.1
Uzlopak commented 5 months ago

How about the following patch:

From 74391aee904a6beae393097a59745424b176f3e3 Mon Sep 17 00:00:00 2001
From: uzlopak <aras.abbasi@googlemail.com>
Date: Mon, 19 Feb 2024 15:26:49 +0100
Subject: [PATCH] remove bottleneck

---
 index.js | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/index.js b/index.js
index 73867be..4e828f7 100644
--- a/index.js
+++ b/index.js
@@ -210,7 +210,7 @@ async function fastifyView (fastify, opts) {
     if (isRaw) {
       return onTemplatesLoaded(file, file)
     }
-    const fileData = await readFile(join(templatesDir, file), 'utf-8')
+    const fileData = await readFileSemaphore(join(templatesDir, file), 'utf-8')
     return onTemplatesLoaded(file, fileData)
   }

@@ -227,7 +227,7 @@ async function fastifyView (fastify, opts) {
       }
       const partialsHtml = {}
       await Promise.all(partialKeys.map(async (key) => {
-        partialsHtml[key] = await readFile(join(templatesDir, partials[key]), 'utf-8')
+        partialsHtml[key] = await readFileSemaphore(join(templatesDir, partials[key]), 'utf-8')
       }))
       lru.set(cacheKey, partialsHtml)
       return partialsHtml
@@ -321,7 +321,7 @@ async function fastifyView (fastify, opts) {
       return compiledPage(data)
     }

-    const file = await readFile(join(templatesDir, page), 'utf8')
+    const file = await readFileSemaphore(join(templatesDir, page), 'utf8')
     const render = readCallback(page, data, opts, file)
     return render(data)
   }
@@ -352,7 +352,7 @@ async function fastifyView (fastify, opts) {
       return compiledPage(data)
     }

-    const file = await readFile(join(templatesDir, page), 'utf8')
+    const file = await readFileSemaphore(join(templatesDir, page), 'utf8')
     const render = readCallback(page, data, opts, file)
     return render(data)
   }
@@ -628,6 +628,20 @@ async function fastifyView (fastify, opts) {
   }
 }

+/**
+ * @type {Map<string, Promise>}
+ */
+const readFileMap = new Map()
+
+function readFileSemaphore (filePath, encoding) {
+  if (readFileMap.has(filePath) === false) {
+    const promise = readFile(filePath, encoding)
+    readFileMap.set(filePath, promise)
+    promise.finally(() => readFileMap.delete(filePath))
+  }
+  return readFileMap.get(filePath)
+}
+
 module.exports = fp(fastifyView, {
   fastify: '4.x',
   name: '@fastify/view'
-- 
2.34.1
Uzlopak commented 5 months ago

@mweberxyz

wdyt?

gurgunday commented 5 months ago

We need this

mweberxyz commented 5 months ago

@Uzlopak lgtm, I will put add the function inside the plugin closure for consistency

mweberxyz commented 5 months ago

If we merge #408 we could get high level benchmarks just by adding a tag 😝

Here's the full output (Node v20.11.1, macOS, M1):

master:

% autocannon -c 100 -d 5 -p 10 localhost:3000
Running 5s test @ http://localhost:3000
100 connections with 10 pipelining factor

┌─────────┬──────┬───────┬───────┬───────┬──────────┬─────────┬────────┐
│ Stat    │ 2.5% │ 50%   │ 97.5% │ 99%   │ Avg      │ Stdev   │ Max    │
├─────────┼──────┼───────┼───────┼───────┼──────────┼─────────┼────────┤
│ Latency │ 6 ms │ 13 ms │ 14 ms │ 19 ms │ 10.53 ms │ 7.24 ms │ 330 ms │
└─────────┴──────┴───────┴───────┴───────┴──────────┴─────────┴────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬──────────┬──────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg      │ Stdev    │ Min     │
├───────────┼─────────┼─────────┼─────────┼─────────┼──────────┼──────────┼─────────┤
│ Req/Sec   │ 81,471  │ 81,471  │ 92,543  │ 93,055  │ 90,476.8 │ 4,525.92 │ 81,412  │
├───────────┼─────────┼─────────┼─────────┼─────────┼──────────┼──────────┼─────────┤
│ Bytes/Sec │ 21.8 MB │ 21.8 MB │ 24.8 MB │ 24.9 MB │ 24.2 MB  │ 1.22 MB  │ 21.8 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴──────────┴──────────┴─────────┘

refactor/async@b13c5dae:

autocannon -c 100 -d 5 -p 10 localhost:3000
Running 5s test @ http://localhost:3000
100 connections with 10 pipelining factor

┌─────────┬──────┬───────┬───────┬───────┬─────────┬─────────┬────────┐
│ Stat    │ 2.5% │ 50%   │ 97.5% │ 99%   │ Avg     │ Stdev   │ Max    │
├─────────┼──────┼───────┼───────┼───────┼─────────┼─────────┼────────┤
│ Latency │ 5 ms │ 10 ms │ 11 ms │ 11 ms │ 8.34 ms │ 5.56 ms │ 275 ms │
└─────────┴──────┴───────┴───────┴───────┴─────────┴─────────┴────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬───────────┬──────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg       │ Stdev    │ Min     │
├───────────┼─────────┼─────────┼─────────┼─────────┼───────────┼──────────┼─────────┤
│ Req/Sec   │ 103,103 │ 103,103 │ 115,455 │ 115,647 │ 112,979.2 │ 4,954.35 │ 103,080 │
├───────────┼─────────┼─────────┼─────────┼─────────┼───────────┼──────────┼─────────┤
│ Bytes/Sec │ 27.6 MB │ 27.6 MB │ 30.9 MB │ 31 MB   │ 30.3 MB   │ 1.32 MB  │ 27.6 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴───────────┴──────────┴─────────┘
Uzlopak commented 5 months ago

Awesome.

@gurgunday Merge if you like.