gofiber / fiber

โšก๏ธ Express inspired web framework written in Go
https://gofiber.io
MIT License
33.9k stars 1.67k forks source link

๐Ÿ› [Bug]: โ‰ˆ500ms Performance penalty using compression on and 'br' in 'Accept-Encoding' #2722

Closed the-hotmann closed 11 months ago

the-hotmann commented 12 months ago

Bug Description

Usually I by default set compression to 2 but today I noticed a problem when on fiber v2.51.0.

This is the config I use:

app.Use(
    compress.New(compress.Config{
        Level: 2,
    }),
)

I tested a bit and came to this conclusion:

  1. direct call/access (about 5ms)
  2. Nginx Proxy call (โ‰ˆ400-600ms)

a direct call will always be fast, but as soon as I access a fiber 2.51.0 pages I see TTFBs of at least 400-600ms.

This penalty just appears to be happening on the HTML part, not any static assets. Turning off compression solves the problem entirely.

So I had multiple thoughts:

  1. the user the docker container runs with does not have permission to write the compressed file and therefore takes long, but I discarted this thought, as this would also have to affect the direct call - which it does not.
  2. I tought about have a wrong ProxyHeader and did some tests there, as there might be some different behaviour behind NPM (Nginx Proxy Manager) but also all of these tests did not lead to anything.

How to Reproduce

Actually already described above

Expected Behavior

New features can have a performance penalty, but 400-600ms is not acceptable I guess?

Fiber Version

v2.51.0

Code Snippet (optional)

No response

Checklist:

ReneWerner87 commented 12 months ago

There is actually nothing in the last version that can trigger this Can you go through the last versions in your setup

the-hotmann commented 12 months ago

I will test it with the last previous 3 versions and report back.

ReneWerner87 commented 12 months ago

Ok thx

nickajacks1 commented 12 months ago

To diagnose where the issue is you can add app.Use(logger.New(logger.ConfigDefault)) to see how long fiber is taking.

package main

import (
    "github.com/gofiber/fiber/v2"
    "github.com/gofiber/fiber/v2/middleware/compress"
    "github.com/gofiber/fiber/v2/middleware/logger"
)

func main() {
    app := fiber.New()
    app.Use("/", logger.New(logger.ConfigDefault))
    app.Use("/", compress.New(compress.Config{Level: 2}))
    app.Static("/", "")
    app.Listen(":3030")
}

You will see logs that show approximately how long fiber took to process the request.

17:49:42 | 200 |    3.869307ms |       127.0.0.1 | GET     | /big.html       
17:49:45 | 200 |       75.42ยตs |       127.0.0.1 | GET     | /big.html       
17:49:47 | 200 |      53.825ยตs |       127.0.0.1 | GET     | /big.html       
17:49:47 | 200 |      42.185ยตs |       127.0.0.1 | GET     | /big.html       

So if it's significantly different between versions of fiber, there's a problem

the-hotmann commented 12 months ago

I always use the middleware logger. But this in this log it was not showing it - I can state that directly. Ofc there could be a discrepancy when using Nginx and Fiber, but I will test first and then provide all the info.

the-hotmann commented 12 months ago

Ok, I just performed some tests:

and could not reproduce the bug. But I just noticed that also added a custom function to the engine, to be able to render HTML as unsafe / unescape - maybe this was causing the problem in combination with compression

Here the added custom-function, which I added to the engine:

engine := html.New("../templates", ".html")
engine.AddFunc(
    "unescape", func(s string) template.HTML {
        return template.HTML(s)
    },
)

I will try some more tests and update the docker-images afterwards - so the links then will reflect the updated state

ReneWerner87 commented 12 months ago

@the-hotmann so it cannot be reproduced? can we then close this issue until it is reproducible and then reopen it if necessary?

the-hotmann commented 12 months ago

I woul like to keep it open for some more hours, as I will re-try to reproduce it. I will come back with feedback and if not be able to reproduce it, I will close it.

Thanks

ReneWerner87 commented 12 months ago

ok

nickajacks1 commented 12 months ago

Actually, I found a similar behavior on the above snippet I shared, but it still occurred as far back as v2.40.0. When I get the html page through a browser, there is a several hundred millisecond delay before the first byte:

chrome_4uqFbwpJeb

If I refresh right away, there is no delay, but if I wait a few seconds and refresh, the delay occurs again. The delay does not occur if I use curl. Using Express, the delay occurs on the first connection, but never again. The connection is happening from windows to WSL (fiber) in my case. I don't know enough about what a browser does on an HTTP get to know if this is the same thing you were experiencing or not, but indeed the same Initial Connection is very short when I connect to your example links.

image
the-hotmann commented 12 months ago

I did some more tests and could reproduce the error again.

Feel free to download all 5 versions/tags from this docker-image: https://hub.docker.com/r/h0tmann/go_fiber_test and use this docker-compose:

version: '3'
services:

##########################
####### FIBER TEST #######
##########################

  go_fiber_2490:
    container_name: go_fiber_2490
    image: h0tmann/go_fiber_test:v2.49.0
    hostname: go_fiber_2490
    ports:
      - 8003:8003
    deploy:
      resources:
        limits:
          memory: 300M
    restart: unless-stopped

  go_fiber_2491:
    container_name: go_fiber_2491
    image: h0tmann/go_fiber_test:v2.49.1
    hostname: go_fiber_2491
    ports:
      - 8004:8003
    deploy:
      resources:
        limits:
          memory: 300M
    restart: unless-stopped

  go_fiber_2492:
    container_name: go_fiber_2492
    image: h0tmann/go_fiber_test:v2.49.2
    hostname: go_fiber_2492
    ports:
      - 8005:8003
    deploy:
      resources:
        limits:
          memory: 300M
    restart: unless-stopped

  go_fiber_2500:
    container_name: go_fiber_2500
    image: h0tmann/go_fiber_test:v2.50.0
    hostname: go_fiber_2500
    ports:
      - 8006:8003
    deploy:
      resources:
        limits:
          memory: 300M
    restart: unless-stopped

  go_fiber_2510:
    container_name: go_fiber_2510
    image: h0tmann/go_fiber_test:v2.51.0
    hostname: go_fiber_2510
    ports:
      - 8007:8003
    deploy:
      resources:
        limits:
          memory: 300M
    restart: unless-stopped

Now use Nginx/NPM to point a domain to each of the containers.

You will end up with two possible URLs to call the very same container:

  1. http://domain.tld
  2. 123.123.123.123:8003 (8003-8007)

DO NOT TEST BEHIND CLOUDFLARE! Cloudflare (my public tests are proxyied throu Cloudflare) is somehow making the problem not showing up.

My public Tests are hosted like this:

Client => Cloudflare => Nginx => Docker-Container

My internal Tests are tested like this:

Client => Nginx => Docker-Container (Here the Problem shows up!)

I am more then happy to provide you guys with the new URLs, apparently they point to IPs which I dont want to share publicly. So feel free to contact me privately or setup your own testcase according to the tutorial above.

Also I tested on the Versions:

It happens on ALL of them. BUT: Fiber itself reports times like 141.991ยตs but the TTFB is soemwhere around 0.5s.

@nickajacks1 thanks for looking into it. I personally have some guesses, but I will just dump on every request the request header and print it to the HTML, then we can see what is the difference. Will rewrite my testcases and rebuild. Some questions for you:

  1. If you proxy through Cloudflare, does the problem also go away for you?
  2. If you call the Golang/Fiber application directly (not through Nginx) does the problem also go away for you?
ReneWerner87 commented 12 months ago

Fiber itself reports times like 141.991ยตs but the TTFB is soemwhere around 0.5s.

can you rule out that it's not the infrastructure you built around fiber?

ReneWerner87 commented 12 months ago

can you find out the version where it first appeared?

the-hotmann commented 12 months ago

Indeed I can not, I have no proof it is fiber. But it completely disappears when turning off:

  1. eighter compression in fiber
  2. or the custom function for the template

Still could be nginx, but again, if behind nginx AND Cloudflare it does not appear. Currently a blackbox.

can you find out the version where it first appeared?

I will try to find out.

the-hotmann commented 12 months ago

Some additional Info:

For the first time (I even recorded it) the app (fiber) crashed on many reloads:

13:24:46 | 200 |    3.560504ms |     127.10.0.10 | GET     | /
13:24:46 | 200 |      11.044ยตs |     127.10.0.10 | GET     | /app.min.css
13:24:47 | 200 |    4.345086ms |     127.10.0.10 | GET     | /
13:24:47 | 200 |      10.703ยตs |     127.10.0.10 | GET     | /app.min.css
13:24:47 | 200 |    4.368002ms |     127.10.0.10 | GET     | /
13:24:47 | 200 |      11.858ยตs |     127.10.0.10 | GET     | /app.min.css
13:24:47 | 200 |    4.005441ms |     127.10.0.10 | GET     | /
13:24:47 | 200 |      14.102ยตs |     127.10.0.10 | GET     | /app.min.css
13:24:48 | 200 |    4.434645ms |     127.10.0.10 | GET     | /
13:24:48 | 200 |      12.454ยตs |     127.10.0.10 | GET     | /app.min.css
13:24:48 | 200 |    6.291476ms |     127.10.0.10 | GET     | /
13:24:48 | 200 |       16.13ยตs |     127.10.0.10 | GET     | /app.min.css
13:24:49 | 200 |    5.200579ms |     127.10.0.10 | GET     | /
13:24:50 | 200 |      69.583ยตs |     127.10.0.10 | GET     | /app.min.css
13:24:51 | 200 |    5.683539ms |     127.10.0.10 | GET     | /

DEBUG:
Host:          scratch
Kernel:        6.5.10-1
GoLang:        1.21.4
Fiber:         v2.51.0
Build date:    2023.11.15

 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚                 Fiber Test v1.0.3                 โ”‚
 โ”‚                   Fiber v2.51.0                   โ”‚
 โ”‚                 http://[::]:8003                  โ”‚
 โ”‚                                                   โ”‚
 โ”‚ Handlers ............. 5  Processes ........... 1 โ”‚
 โ”‚ Prefork ....... Disabled  PID ................. 1 โ”‚
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

13:24:52 | 200 |    4.962311ms |     127.10.0.10 | GET     | /
13:24:53 | 200 |     962.536ยตs |     127.10.0.10 | GET     | /app.min.css
13:24:54 | 200 |    4.472951ms |     127.10.0.10 | GET     | /
13:24:54 | 200 |      30.002ยตs |     127.10.0.10 | GET     | /app.min.css
13:24:55 | 200 |    4.500065ms |     127.10.0.10 | GET     | /
13:24:56 | 200 |       7.734ยตs |     127.10.0.10 | GET     | /app.min.css
13:24:56 | 200 |    4.019466ms |     127.10.0.10 | GET     | /
13:24:57 | 200 |       9.426ยตs |     127.10.0.10 | GET     | /app.min.css
13:24:57 | 200 |    6.787522ms |     127.10.0.10 | GET     | /
13:24:58 | 200 |       9.033ยตs |     127.10.0.10 | GET     | /app.min.css
13:24:58 | 200 |    5.446333ms |     127.10.0.10 | GET     | /
13:24:59 | 200 |      14.491ยตs |     127.10.0.10 | GET     | /app.min.css
13:24:59 | 200 |    6.466976ms |     127.10.0.10 | GET     | /
13:25:00 | 200 |       9.368ยตs |     127.10.0.10 | GET     | /app.min.css
13:25:01 | 200 |    3.629219ms |     127.10.0.10 | GET     | /
13:25:02 | 200 |       7.669ยตs |     127.10.0.10 | GET     | /app.min.css

Here the video:

https://github.com/gofiber/fiber/assets/17692860/577ce5e4-4ac9-48e6-983b-c442bdf72f62

P.S.: it crashed because it was OOM. When the thing happens (the high TTFBs) every reload consumes abour 30MB RAM, which will not be cleared by the GC for a long time (if ever). Since I set RAM-Limits in my docker-compose to 300MB - which is more than plenty - the app crashed after some reloads.

the-hotmann commented 12 months ago

Tested down to v2.0.0. Even happened there. Had to remove a lot of the config, but the error was reproducable.

Maybe it is not the app itself, but the template package? Will try to downgrade this. Cant use "github.com/gofiber/template/html" anymore, throws this error:

github.com/gofiber/template/html: module github.com/gofiber/template@latest found (v1.8.2), but does not contain package github.com/gofiber/template/html
the-hotmann commented 12 months ago

I added the functionality to print out the request header. Here are the two headers:

Fast/Direct:

Header Value
Accept text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding gzip, deflate
Accept-Language de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7
Cache-Control max-age=0
Connection keep-alive
Host ###CENSORED###:8007
Upgrade-Insecure-Requests 1
User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36

Slow/Nginx:

Header Value
Accept text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding gzip, deflate, br
Accept-Language de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7
Cache-Control max-age=0
Connection close
Cookie __hstc=25561058.5ae6dd05059e40a9b3017761e5eb15e1.1699973432021.1699973432021.1699973432021.1; hubspotutk=5ae6dd05059e40a9b3017761e5eb15e1; _gcl_au=1.1.965372761.1699973432
Host fibercompression-v2-51-0.###CENSORED###
Sec-Ch-Ua "Google Chrome";v="119", "Chromium";v="119", "Not?A_Brand";v="24"
Sec-Ch-Ua-Mobile ?0
Sec-Ch-Ua-Platform "Windows"
Sec-Fetch-Dest document
Sec-Fetch-Mode navigate
Sec-Fetch-Site none
Sec-Fetch-User ?1
Upgrade-Insecure-Requests 1
User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36
X-Forwarded-For ###CENSORED###
X-Forwarded-Proto https
X-Forwarded-Scheme https
X-Real-Ip ###CENSORED###

Maybe it does have something to do with the discrepancy of the Accept-Encoding header? If Fiber sends back a brotli encoded file Nginx must reencode it to gzip? Since my browser somehow does not accept br?

Also the:

Connection close

Is strange, it should actually keep the connection opened.

This is the current full fibver config:

    engine := html.New("../templates", ".html")
    engine.AddFunc(
        "unescape", func(s string) template.HTML {
            return template.HTML(s)
        },
    )

    app := fiber.New(fiber.Config{
        Views:                engine,
        CompressedFileSuffix: ".gz",
        //AppName:              "Fiber Test v1.0.3",
        //JSONEncoder:          json.Marshal,
        //JSONDecoder:          json.Unmarshal,
        //Network: "tcp",
    })

    app.Use(
        // Aktiviere Kompression
        compress.New(compress.Config{
            Level: 2,
        }),

        // Aktiviere Logger
        logger.New(),
    )

    // Statische Dateien unter / ausliefern
    app.Static("/", "../static", fiber.Static{
        Compress: true,
    })

Some parts of the fiber.Config are commented out for backwards compatibility.

the-hotmann commented 12 months ago

With Cloudflare the problem does not occur and The request Headers through Cloudflare look like this:

Header Value
Accept text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding gzip
Accept-Language de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7
Cdn-Loop cloudflare
Cf-Connecting-Ip ###CENSORED###
Cf-Ipcountry DE
Cf-Ray 826852535a24bb62-FRA
Cf-Visitor {"scheme":"https"}
Connection close
Host fibercompression-v2-51-0.hotmann.de
Priority u=0, i
Sec-Ch-Ua "Google Chrome";v="119", "Chromium";v="119", "Not?A_Brand";v="24"
Sec-Ch-Ua-Mobile ?0
Sec-Ch-Ua-Platform "Windows"
Sec-Fetch-Dest document
Sec-Fetch-Mode navigate
Sec-Fetch-Site none
Sec-Fetch-User ?1
True-Client-Ip ###CENSORED###
Upgrade-Insecure-Requests 1
User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36
X-Forwarded-For ###CENSORED###, ###CENSORED###
X-Forwarded-Proto https
X-Forwarded-Scheme https
X-Real-Ip ###CENSORED###

Here again is no Accept-Encoding header with br:

Accept-Encoding gzip
ReneWerner87 commented 12 months ago

Can you try the same or similar without fiber? Have you also activated compression in nginx for the routes? Then I would deactivate one of the two

Please try to narrow it down further

the-hotmann commented 12 months ago

I have to decline. I did some more testing with NOTHING, but fiber. And the situation is unambiguously & reproducable - it is Fiber. Here how to reproduce it:

Setup fiber with

    app.Use(
        // Aktiviere Kompression
        compress.New(compress.Config{
            Level: 2,
        }),
    )

start the App

Now run these two curl requests directly and locally against your application:

1.)

curl -o /dev/null -s -w "TTFB: %{time_starttransfer}\n" http://127.0.0.1:8003 -H "Accept-Encoding: gzip, deflate"

2.) (Note: with br)

curl -o /dev/null -s -w "TTFB: %{time_starttransfer}\n" http://127.0.0.1:8003 -H "Accept-Encoding: gzip, deflate, br"

For me the results are:

1.)

$ curl -o /dev/null -s -w "TTFB: %{time_starttransfer}\n" http://127.0.0.1:8003 -H "Accept-Encoding: gzip, deflate"
TTFB: 0.006761

2.) (Note: with br)

$ curl -o /dev/null -s -w "TTFB: %{time_starttransfer}\n" http://127.0.0.1:8003 -H "Accept-Encoding: gzip, deflate, br"
TTFB: 0.545109

I have run this multiple times. It all times gives me the same result. For me this is now proof that the problem is with fiber, as all other things in the infrastructure, as this is a direct request. I also tested this on multiple servers - but also here this request is internal (127.0.0.1:8003). This discovery aligns with the identified request headers of the requests shown above.

My analysis: fiber does have problems when br is part of the Accept-Encoding Request-Header and compression is turned on.

ReneWerner87 commented 12 months ago

Ok that's interesting

Maybe brotli just needs more time

By the way, we use a fasthttp middleware in our middleware for handling the compression, which internal uses a additional lib for it

So we need to find out which of these components is affected Can you join Discord, it might be easier to work on this problem together there

ReneWerner87 commented 12 months ago

https://github.com/gofiber/fiber/blob/master/middleware/compress/compress.go#L36

https://github.com/valyala/fasthttp/blob/master/brotli.go#L9

nickajacks1 commented 12 months ago

When I use br encoding on a 963KB file with level 2 compression on a bare fiber server on localhost, I get these results:

TTFB: 0.031775

real    0m2.076s
user    0m0.000s
sys 0m0.023s

The whole request takes forever, but TTFB isn't as bad. Deflate and gzip are also slowish but still much faster (~150ms real time)

But if I use level 1 compression with brotli, it's a lot faster

TTFB: 0.007094

real    0m0.041s
user    0m0.013s
sys 0m0.007s
the-hotmann commented 12 months ago

Can you join Discord, it might be easier to work on this problem together there

I am there :)

When I use br encoding on a 963KB file with level 2 compression on a bare fiber server on localhost, I get these results:

The fastest response from a fiber server I get, when I use compress as accept-encoding:

curl -o /dev/null -s -w "TTFB: %{time_starttransfer}\n" http://127.0.0.1:8003 -H "Accept-Encoding: compress"

Here my results with time:

br

$ time curl -o /dev/null -s -w "TTFB: %{time_starttransfer}\n" http://127.0.0.1:8003 -H "Accept-Encoding: br"
TTFB: 0.493420

real    0m0,521s
user    0m0,015s
sys     0m0,015s

compress

$ time curl -o /dev/null -s -w "TTFB: %{time_starttransfer}\n" http://127.0.0.1:8003 -H "Accept-Encoding: compress"
TTFB: 0.003681

real    0m0,022s
user    0m0,009s
sys     0m0,011s
the-hotmann commented 11 months ago

Via Discord the problem could be locally reproducts, but it just seems that Compression-Level 2 (brotli lvl9 LINK) ist just very CPU intense and takes much longer than other compression techniques on theirhighest level.

So even if a single request took about 0.5s, this is to be expected when using brotli on its highest level, combined with some 100kb of text.

@ReneWerner87 thank you for the clearification! @nickajacks1 thanks for the quick reproduction of the error and participation

P.S.: just don't use compression level 2 in the compress.Config ;)