caddyserver / caddy

Fast and extensible multi-platform HTTP/1-2-3 web server with automatic HTTPS
https://caddyserver.com
Apache License 2.0
57.9k stars 4.02k forks source link

Frequent hangs when using http/2 push #3896

Closed aDorofeev closed 2 years ago

aDorofeev commented 3 years ago

My team and I use Caddy as a reverse proxy, and we rely on HTTP/2 Push a lot. We've started with v2.2.1, on certain configurations we experienced random hangs.

We thought that the problem was fixed with this merge request: https://github.com/caddyserver/caddy/pull/3875 (more details can be found in the net/http issue: https://github.com/golang/go/issues/42534) Because it didn't occur for past couple weeks, since it was merged. But today we've found another configuration that makes it very easy to reproduce the problem again, on any of the recent Caddy versions.

Caddy behavior when hangs happen

How to reproduce

It depends on the proxied website and caddy config, and some random factors, thus it occurs with different frequency on different hardware. The steps are:

  1. Start caddy with the configuration provided below
  2. Open developer tools network tab, to visually see the hangs; checking the "disable cache" toggle will help reproduce the problem faster, but is not necessary
  3. Navigate to / page in a browser (i.e., https://terem-pro.localhost)
  4. Wait till it fully loads
  5. If it didn't hang on step 4 - hit f5, and again, wait till it fully loads; repeat several times if needed

On our test server, it usually hangs after 2-3 reloads. On some devices, it might require 10-15 attempts but still hangs at some point.

Caddy version

Reproduces on:

Built with: CADDY_RACE_DETECTOR=1 xcaddy build <revision>

Caddy configuration

https://terem-pro.localhost {                                                 
    handle {                                                                                                                                                           
        reverse_proxy https://www.terem-pro.ru {
            header_up host {http.reverse_proxy.upstream.host}
        }                                

        push / {                         
            /local/components/terem/catalog.list/templates/index.best.seller/style.css
            /local/components/terem/new_services.content/templates/home.banner.lots/style.css
            /local/components/terem/slider.blocks/templates/slider.useful/style.css
            /local/components/terem/standard.blocks/templates/call.action.white/style.css
            /local/components/terem/review.list/templates/carousel.home/style.css
            /local/components/terem/standard.blocks/templates/promo.red.home/style.css
            /local/components/terem/promotion.list/templates/home.slider/style.css
            /local/components/terem/form.form/templates/template.pdf/style.css
            /local/templates/terem/components/bitrix/menu/template.header.menu.top/desktop-menu.css
            /local/components/terem/form.form/templates/template.taxi/style.css
            /assets/resources/css/home.css
            /local/templates/terem/components/bitrix/menu/template.header.menu-mobile/style_menu.css
            /local/components/terem/catalog.type.list/templates/.default/style.css
            /assets/resources/css/styles.css
            /bitrix/cache/css/s1/terem/template_ad73b02503569e1113abf0b013fdbb28/template_ad73b02503569e1113abf0b013fdbb28_v1.css?16067202133580
            /bitrix/cache/css/s1/terem/page_074396ca6d41424fe878cb365c109aa1/page_074396ca6d41424fe878cb365c109aa1_v1.css?160672023225970
        }
    }
}

System environment:

Both test server and my PC run on Ubuntu 20.04.1 LTS, x86_64, No-docker Caddy installation

Highlights

mholt commented 3 years ago

There's an awful lot going on here... can you simplify it down more?

How can we reproduce it? You haven't provided the site files so I can't use that config to reproduce the behavior.

Does it happen without reverse proxying?

Does it happen without reverse proxying to an external Russian server?

It's likely that this is a bug in the Go x/net libraries, as our code doesn't deal with the details of HTTP/2 streams.

Have you been able to verify this is not a browser bug?

mholt commented 3 years ago

Okay, I took a few minutes and was able to reproduce this in Chrome -- I had mistakenly thought you were trying to push local static resources but then I realized you had no file_server -- you're proxying everything to an external server, including pushes. I noticed that none of the requests stuck with "pending" in the Status column have "push" in the Initiator column -- in other words, it does not appear to be the push that is hanging.

When you close the browser it then says "http2: stream closed" (as you noted) and also "client disconnected" errors. If the response is incomplete, this is most definitely a bug in either:

Or a combination of all three.

Edit: Enable debug mode for details about what Caddy is pushing:

{
    debug
}
...
mholt commented 3 years ago

Also, given that Chrome intends to remove server push I doubt that there will be much interest in fixing any lingering bugs, even upstream.

dtelyukh commented 3 years ago

I reopened the ticket in Go's repository.

dtelyukh commented 3 years ago

Will be fixed in Go 1.16.

mholt commented 3 years ago

Thanks so much for the time spent in addressing the issue! And thanks for understanding.

dtelyukh commented 3 years ago

Tested with go1.16rc1: the problem still exists.

mholt commented 3 years ago

Strange, does the Go team know that the problem still exists?

dtelyukh commented 3 years ago

I reopened the ticket in Go's repository again.

whitesn commented 3 years ago

Is there any temporary workaround / hack for this?

dtelyukh commented 3 years ago

Is there any temporary workaround / hack for this?

We use custom fork of Caddy server with patched modules/caddyhttp/app.go: https://github.com/caddyserver/caddy/pull/3910/files

whitesn commented 3 years ago

I'm not sure if this is same / different issue. Was using Caddy 1 before with no problem, then since moving to Caddy 2, the server frequently hangs with

ERROR http.log.error.log0 dialing backend: dial unix /run/php/php7.4-fpm.sock: connect: resource temporarily unavailable

Sorry if this is not related, I'll open up a new issue with my Caddyfile config too.

francislavoie commented 3 years ago

Yeah @whitesn some more detail would be helpful. But at a glance, that doesn't seem related, because that error message is about the proxy not being able to connect to your php-fpm service.

dtelyukh commented 3 years ago

@mholt, earlier I agreed with you about explicit usage of x/net/http2.

But I have found that other servers (Algernon and Traefik) use such approach.

  1. Algernon https://github.com/xyproto/algernon/blob/1596140dda68ead52624855ce2470b7b764f4f21/engine/serve.go#L44
  2. Traefik https://github.com/traefik/traefik/blob/326be29568426e4ac8e2bf65fdae3906e8dd58fd/pkg/server/service/smart_roundtripper.go

What do you think about it?

mholt commented 3 years ago

@dtelyukh (The second link is for ConfigureTransport, not ConfigureServer)

It is unnecessary, as it gets called for us under the hood: https://sourcegraph.com/github.com/golang/go@0f66fb7b856b02497cf801ce72d80f375f53358b/-/blob/src/net/http/server.go#L3238-3253

Antarian commented 3 years ago

{"level":"error","ts":1614297333.1600995,"logger":"http.handlers.reverse_proxy","msg":"aborting with incomplete response","error":"http2: stream closed"} This is happening with local Docker configuration and reverse proxy is to Next/React PWA app and Chrome browser (haven't tried another one). To see whole Docker setup just check code for ApiPlatform https://github.com/api-platform/api-platform

Antarian commented 3 years ago

Thinking if the same problem will be in a production. I tried redirect to generated html app and haven't noticed a problem so far. React app in api-platform is generating content and looking for changes (watch mode) as it is in development mode. But this is just a observation, not really proper tests done.

mholt commented 3 years ago

Apparently Traefik experiences the same issue: https://github.com/golang/go/issues/42534#issuecomment-790951253

I was able to reproduce this with the reverse proxy Traefik, and can confirm that directly calling the /x/net/http2 ConfigureServer method fixes this issue. I can also confirm that this is not fixed in the golang 1.16.0 stdlib,

Still awaiting fix in Go standard lib.

aledbf commented 3 years ago

@mholt any update with this issue? Is there any workaround?

mholt commented 3 years ago

Please track https://github.com/golang/go/issues/42534 where the issue lies.

mholt commented 2 years ago

Can anyone confirm whether this is still an issue on Go 1.18?

francislavoie commented 2 years ago

A build from master built with Go 1.18 can be grabbed here: https://github.com/caddyserver/caddy/actions/runs/1989510139, if someone can try it out and compare with their older build.

dtelyukh commented 2 years ago

I confirm that this is still an issue on Go 1.18. I used build from this link https://github.com/caddyserver/caddy/actions/runs/1989510139

mholt commented 2 years ago

Bummer. Thanks for verifying 🫤

francislavoie commented 2 years ago

I'm thinking we can close this, because HTTP/2 push is being dropped by Chrome and other browsers will likely follow suit.

HTTP 103 Early Hints is looking to be the replacement.

WDYT @mholt ?

mholt commented 2 years ago

That's probably fine, although it tends to bother me when I don't understand issues and they are left unresolved :sweat:

It sounds like there's not much we can do about it anyway, it has to be fixed in Go, from our understanding.

mitexleo commented 1 year ago

This problem still exist 😕