withinboredom commented 10 months ago

This is an issue to perform some "scientific" benchmarks, leveraging 20 years of finagling this stuff. Scientific is in quotes because I intend this to be a scientific endeavor, but read on.


  1. Fully tuned php.ini
  2. Fully tuned nginx + fpm
  3. Fully tuned apache
  4. Discover caddy + FrankenPHP's best tuning
    • What Go environment variables make a difference, and how to set them
    • Compilation options?
    • worker mode vs. cgi mode vs. static binary
  5. Documentation to perform the benchmarks
    • kernel settings/parameters
    • device characteristics (arm/x86/etc, cores, etc)
    • use rented device that is cheap and always available -- anyone should be able to reproduce -- but also represent realistic production machines.


  1. Generic benchmarking suite (I am not building anything)
  2. Framework benchmarks (Not comparing frameworks)

Known Caveats:

  1. Extensions and code will make a difference, but if the documentation is good enough, people can perform their benchmarks with their desired configuration.
  2. The PHP code under test must be chosen carefully to illustrate typical PHP characteristics from the perspective of a SAPI (setting headers, outputting data, JIT'able code, etc.)
  3. From casual testing, we're much more likely to saturate network links long before CPU with FrankenPHP, thus we either need high performing links, or underpowered machines. Needs further investigation.




withinboredom commented 10 months ago

Reporting progress so far:

Here's the terraform to create the testing infrastructure:

Commands I ran on the bm-server to get it ready:

apt update
apt upgrade
apt install nginx php-fpm libapache2-mod-php apache2 wget btop net-tools
systemctl disable nginx apache2 php8.2-fpm
systemctl stop nginx apache2 php8.2-fpm
mkdir -p /app/public
mkdir -p /etc/caddy
wget https://github.com/dunglas/frankenphp/releases/download/v1.0.3/frankenphp-linux-x86_64
chmod +x frankenphp-linux-x86_64
mv frankenphp-linux-x86_64 /usr/local/bin/frankenphp
# wait

sysctl net.core.somaxconn=1024
ifconfig eth0 txqueuelen 5000
sysctl net.core.netdev_max_backlog=2000
sysctl net.ipv4.tcp_max_syn_backlog=2048

FRANKENPHP_CONFIG="worker /app/public/index.php 32" GOGC=3200 SERVER_NAME=":81" frankenphp run -c /etc/caddy/Caddyfile > /dev/null 2>&1 &

And the bm-client:

apt update
apt upgrade
apt install k6

# wait

sysctl net.ipv4.ip_local_port_range="15000 61000"
sysctl net.ipv4.tcp_fin_timeout=30
sysctl net.ipv4.tcp_tw_recycle=1
sysctl net.ipv4.tcp_tw_reuse=1

I didn't adjust the php.ini, since we aren't exactly benchmarking PHP here, but the SAPI.

Then, with the following PHP file:



function handle_request(): void
        header('Content-Type: application/json');
        $data = $_POST['data'] ?? null;
        $cookie = $_COOKIE['cookie'] ?? null;
        $body = json_encode(['data' => $data, 'cookie' => $cookie]);
        header('Content-Length: ' . strlen($body));
        echo $body;

if($_SERVER['FRANKENPHP_WORKER'] ?? false) {
        while (frankenphp_handle_request(handle_request(...))) {}


Finally, I used the default Caddyfile with this repo, and the following two load test files

const payload = 'data=test'

// The function that defines VU logic.
// See https://grafana.com/docs/k6/latest/examples/get-started-with-k6/ to learn more
// about authoring k6 scripts.
export default function () {
  const res = http.post('', payload)
  check(res, {
    'is status 200': (r) => r.status === 200,
// load-test-franken.js
import http from 'k6/http'
import { check } from 'k6'

export const options = {
  // A number specifying the number of VUs to run concurrently.
  vus: 100,
  // A string specifying the total duration of the test run.
  duration: '30s'

const payload = 'data=test'

// The function that defines VU logic.
// See https://grafana.com/docs/k6/latest/examples/get-started-with-k6/ to learn more
// about authoring k6 scripts.
export default function () {
  const res = http.post('', payload)
  check(res, {
    'is status 200': (r) => r.status === 200,

I'll leave another comment with the results and some preliminary conclusions.

withinboredom commented 10 months ago

Raw results:

k6 run load-test-franken.js -u 1000 --no-connection-reuse

  execution: local
     script: load-test-franken.js
     output: -

  scenarios: (100.00%) 1 scenario, 1000 max VUs, 1m0s max duration (incl. graceful stop):
           * default: 1000 looping VUs for 30s (gracefulStop: 30s)

     ✓ is status 200

     checks.........................: 100.00% ✓ 995134       ✗ 0
     data_received..................: 193 MB  6.4 MB/s
     data_sent......................: 129 MB  4.3 MB/s
     http_req_blocked...............: avg=856.65µs min=137.4µs  med=242.13µs max=65.11ms  p(90)=1.79ms  p(95)=4.24ms
     http_req_connecting............: avg=694.87µs min=107.78µs med=208.11µs max=62.91ms  p(90)=1.48ms  p(95)=3.52ms
     http_req_duration..............: avg=29.03ms  min=502.98µs med=28.68ms  max=104.21ms p(90)=39.49ms p(95)=43.57ms
       { expected_response:true }...: avg=29.03ms  min=502.98µs med=28.68ms  max=104.21ms p(90)=39.49ms p(95)=43.57ms
     http_req_failed................: 0.00%   ✓ 0            ✗ 995134
     http_req_receiving.............: avg=1.26ms   min=12.92µs  med=52.69µs  max=51.64ms  p(90)=5.03ms  p(95)=7.86ms
     http_req_sending...............: avg=742.23µs min=7.87µs   med=29.36µs  max=51.64ms  p(90)=2.23ms  p(95)=4.73ms
     http_req_tls_handshaking.......: avg=0s       min=0s       med=0s       max=0s       p(90)=0s      p(95)=0s
     http_req_waiting...............: avg=27.02ms  min=386.63µs med=27.64ms  max=90.61ms  p(90)=36.63ms p(95)=39.92ms
     http_reqs......................: 995134  33150.822124/s
     iteration_duration.............: avg=30.1ms   min=757.69µs med=29.29ms  max=115.13ms p(90)=40.43ms p(95)=44.84ms
     iterations.....................: 995134  33150.822124/s
     vus............................: 1000    min=1000       max=1000
     vus_max........................: 1000    min=1000       max=1000

running (0m30.0s), 0000/1000 VUs, 995134 complete and 0 interrupted iterations
default ✓ [======================================] 1000 VUs  30s

And nginx:

k6 run load-test.js -u 1000 --no-connection-reuse

  execution: local
     script: load-test.js
     output: -

  scenarios: (100.00%) 1 scenario, 1000 max VUs, 1m0s max duration (incl. graceful stop):
           * default: 1000 looping VUs for 30s (gracefulStop: 30s)

     ✓ is status 200

     checks.........................: 100.00% ✓ 1155748     ✗ 0
     data_received..................: 214 MB  7.1 MB/s
     data_sent......................: 162 MB  5.4 MB/s
     http_req_blocked...............: avg=1.24ms   min=131.19µs med=242.8µs  max=67.3ms   p(90)=3.41ms  p(95)=6.57ms
     http_req_connecting............: avg=916.21µs min=113.08µs med=211.15µs max=67.25ms  p(90)=2.58ms  p(95)=5ms
     http_req_duration..............: avg=24.2ms   min=306.62µs med=22.94ms  max=80.3ms   p(90)=35.46ms p(95)=39.25ms
       { expected_response:true }...: avg=24.2ms   min=306.62µs med=22.94ms  max=80.3ms   p(90)=35.46ms p(95)=39.25ms
     http_req_failed................: 0.00%   ✓ 0           ✗ 1155748
     http_req_receiving.............: avg=1.65ms   min=12.26µs  med=48.06µs  max=58.29ms  p(90)=6.78ms  p(95)=9.81ms
     http_req_sending...............: avg=989.05µs min=8.01µs   med=27.39µs  max=58.41ms  p(90)=3.24ms  p(95)=5.71ms
     http_req_tls_handshaking.......: avg=0s       min=0s       med=0s       max=0s       p(90)=0s      p(95)=0s
     http_req_waiting...............: avg=21.56ms  min=264.85µs med=22.44ms  max=64.54ms  p(90)=30.24ms p(95)=33.25ms
     http_reqs......................: 1155748 38504.81304/s
     iteration_duration.............: avg=25.88ms  min=628.86µs med=23.68ms  max=117.11ms p(90)=37.59ms p(95)=42.4ms
     iterations.....................: 1155748 38504.81304/s
     vus............................: 1000    min=1000      max=1000
     vus_max........................: 1000    min=1000      max=1000

running (0m30.0s), 0000/1000 VUs, 1155748 complete and 0 interrupted iterations
default ✓ [======================================] 1000 VUs  30s

Note, they are pretty inline with each other, the biggest difference is at that it appears caddy/FrankenPHP can take nearly unlimited traffic, while nginx sheds traffic.

k6 run load-test-franken.js -u 10000 --no-connection-reuse

  execution: local
     script: load-test-franken.js
     output: -

  scenarios: (100.00%) 1 scenario, 10000 max VUs, 1m0s max duration (incl. graceful stop):
           * default: 10000 looping VUs for 30s (gracefulStop: 30s)

     ✓ is status 200

     checks.........................: 100.00% ✓ 958586       ✗ 0
     data_received..................: 186 MB  6.0 MB/s
     data_sent......................: 125 MB  4.0 MB/s
     http_req_blocked...............: avg=120.6ms  min=133.53µs med=279.75µs max=4.17s    p(90)=175.66ms p(95)=1.02s
     http_req_connecting............: avg=120.51ms min=116.8µs  med=252.4µs  max=4.16s    p(90)=175.45ms p(95)=1.02s
     http_req_duration..............: avg=188.91ms min=564.86µs med=182.37ms max=1.83s    p(90)=267.46ms p(95)=297ms
       { expected_response:true }...: avg=188.91ms min=564.86µs med=182.37ms max=1.83s    p(90)=267.46ms p(95)=297ms
     http_req_failed................: 0.00%   ✓ 0            ✗ 958586
     http_req_receiving.............: avg=3.28ms   min=14.92µs  med=77.65µs  max=151.64ms p(90)=9.31ms   p(95)=18.75ms
     http_req_sending...............: avg=2.21ms   min=7.68µs   med=56.94µs  max=138.24ms p(90)=6.68ms   p(95)=8.87ms
     http_req_tls_handshaking.......: avg=0s       min=0s       med=0s       max=0s       p(90)=0s       p(95)=0s
     http_req_waiting...............: avg=183.41ms min=419.73µs med=179.85ms max=1.83s    p(90)=256.24ms p(95)=286.53ms
     http_reqs......................: 958586  31124.388239/s
     iteration_duration.............: avg=312.35ms min=45.1ms   med=198.63ms max=4.37s    p(90)=543.2ms  p(95)=1.21s
     iterations.....................: 958586  31124.388239/s
     vus............................: 283     min=283        max=10000
     vus_max........................: 10000   min=10000      max=10000

running (0m30.8s), 00000/10000 VUs, 958586 complete and 0 interrupted iterations
default ✓ [======================================] 10000 VUs  30s

and nginx:

k6 run load-test.js -u 10000 --no-connection-reuse

  execution: local
     script: load-test.js
     output: -

  scenarios: (100.00%) 1 scenario, 10000 max VUs, 1m0s max duration (incl. graceful stop):
           * default: 10000 looping VUs for 30s (gracefulStop: 30s)

     ✗ is status 200
      ↳  40% — ✓ 508842 / ✗ 739342

     checks.........................: 40.76%  ✓ 508842       ✗ 739342
     data_received..................: 337 MB  11 MB/s
     data_sent......................: 175 MB  5.8 MB/s
     http_req_blocked...............: avg=149.13ms min=134.39µs med=51.79ms  max=3.22s    p(90)=209.33ms p(95)=1.05s
     http_req_connecting............: avg=148.63ms min=117.69µs med=51.6ms   max=3.22s    p(90)=203.27ms p(95)=1.04s
     http_req_duration..............: avg=80.63ms  min=248.29µs med=67.74ms  max=629.84ms p(90)=180.38ms p(95)=220.82ms
       { expected_response:true }...: avg=92.54ms  min=428.29µs med=50.35ms  max=629.84ms p(90)=226.63ms p(95)=281.72ms
     http_req_failed................: 59.23%  ✓ 739342       ✗ 508842
     http_req_receiving.............: avg=5.63ms   min=14.67µs  med=5.74ms   max=282.07ms p(90)=9.85ms   p(95)=12.7ms
     http_req_sending...............: avg=5.45ms   min=8.16µs   med=5.16ms   max=233.81ms p(90)=8.96ms   p(95)=11.36ms
     http_req_tls_handshaking.......: avg=0s       min=0s       med=0s       max=0s       p(90)=0s       p(95)=0s
     http_req_waiting...............: avg=69.55ms  min=167.26µs med=55.82ms  max=625.1ms  p(90)=162.86ms p(95)=207.5ms
     http_reqs......................: 1248184 41478.797471/s
     iteration_duration.............: avg=237.72ms min=548.72µs med=142.65ms max=3.5s     p(90)=519.76ms p(95)=1.13s
     iterations.....................: 1248184 41478.797471/s
     vus............................: 10000   min=10000      max=10000
     vus_max........................: 10000   min=10000      max=10000

running (0m30.1s), 00000/10000 VUs, 1248184 complete and 0 interrupted iterations
default ✓ [======================================] 10000 VUs  30s

Still digging into this (no conclusions can be made yet), but thought I'd report on progress so far.

dunglas commented 10 months ago

Great! This will be super helpful!

Maybe would it be better to compare dynamic builds with dynamic builds. The static build is known to be slower than the dynamic build of PHP (no JIT, more extensions, etc). Maybe should we use Docker to ease the process?

Also, the default Caddyfile in this repo enables more features than what is enabled in NGINX. For instance, by default logs are on, but off in NGINX (this can make a huge difference). Also, HTTP/2 and HTTP/3 are on for Caddy, but not for NGINX.

Finally, for the worker mode, maybe could it be more interesting to compare a more real-life app, like a Symfony or a Laravel app (for a simple "hello world" script like this, the worker mode is mostly useless).

nickchomey commented 10 months ago

I also wonder if nginx could/should be tweaked - such a high failure rate seems suspicious. Perhaps there's a max duration parameter that could be modified so as to better match what's happening with caddy?

Though, this comprehensive benchmark between Caddy and nginx seems to have concluded something similar - nginx favors latency over completion/no errors.


You might even consider just starting from the terraform, configs and tests that they used and provide, and adding Frankenphp to it, along with a dynamic web app (which is surely the real goal here) as dunglas suggested above.

It could probably be fairly assumed that Caddy's performance has improved more than nginx in the time since that test (probably using v2.5.2, so missing progress from v2.6, 2.7). Though, my guess is that the difference between the servers will be negligible when the webapp (and database etc...) is the bottleneck - though hopefully Frankenphp shows a meaningful improvement over caddy/nginx+fpm given its direct-connection

withinboredom commented 10 months ago

by default logs are on, but off in NGINX (this can make a huge difference)

I turned logs off in nginx, but just redirected logs from caddy to /dev/null ... not the same thing by any means, but I'll add that to ensure they are off for both.

maybe could it be more interesting to compare a more real-life app

For sure, though we are getting into 'testing framework' territory and not 'testing sapi' territory. E.g., how fast can the sapi add some headers and output a string (though I would like to exercise the sapi more).

the worker mode is mostly useless

Not really, there is still overhead in switching between go/c/go and I'm mostly curious what it would look like with worker mode being disabled, but we can't test that yet.

Perhaps there's a max duration parameter that could be modified so as to better match what's happening with caddy?

nginx isn't timing out, it's just refusing connections. I've fiddled with max_processes and friends, but I haven't worked out how to get it to handle 10k concurrent connections yet. It's probably something dumb.

my guess is that the difference between the servers will be negligible when the webapp (and database etc...) is the bottleneck

This is exactly why I don't want to test frameworks and is a non-goal. I want to test the sapi, not php.

withinboredom commented 10 months ago

To, add. There's also an issue of bandwidth. Right now, for these tests, we are doing well over 1 gbps at times (especially caddy), sending much bigger bodies would very quickly eat up quite a bit of bandwidth.

In fact, I was surprised by the raw reqs per second here because caddy tended to use more bandwidth. I will investigate this at the packet level, as I also suspect a bug in Caddy (or a library) based on some other benchmarks I took along the way.

That's a good blog post btw @nickchomey, I'll see what I can steal.

withinboredom commented 10 months ago

Yep. I suspect there is a bug... somewhere deep in somewhere.

I'm seeing the server send it's FIN packet sometimes hundreds of milliseconds after receiving one from the client -- particularly when under load. There doesn't appear to be any other delay anywhere else in Caddy. I don't see this in nginx. So in Caddy, the connection is "open" for much longer than it should be.

withinboredom commented 8 months ago

It appears that embedded php is about twice as slow as non-embedded, which doesn't make sense. They should be about equal. I'm investigating this.

binaryfire commented 8 months ago

@withinboredom By embedded, are you referring to static binary builds?

dunglas commented 8 months ago

@withinboredom this doesn't surprise me much. Musl makes PHP slower and prevents JIT from working.

nickchomey commented 8 months ago

It was my impression that JIT is minimally helpful on most real world web apps. What do the benchmarks consist of?

Is opcache working in embedded? Because that tends to make a huge (2x-like) difference