Open dtelyukh opened 3 years ago
Possible related issue https://github.com/golang/go/issues/23559
cc @fraenkel
@dtelyukh We are going to need something that can reproduce the issue. It would also help to enable http2 debug and a thread dump when it hangs.
Your title says the bundled version of http2 has this issue. Are you implying that if you use the latest x/net/http2 you dont?
@dtelyukh We are going to need something that can reproduce the issue. It would also help to enable http2 debug and a thread dump when it hangs.
It's not easy to prepare code, which can reproduce this problem with 100% guarantee, but I think I could try.
Here is debug.log for GODEBUG=http2debug=2
http2.debug.log
and goroutines dump with pprof
goroutine.dump.log
Your title says the bundled version of http2 has this issue. Are you implying that if you use the latest x/net/http2 you dont?
h2_bundle.go used by third-party code. I didn't try to use x/net/http2 directly. Do you mean that I should do that?
Don't worry. I am going to need something that can reproduce this issue. I can see why nothing is making progress but I don't know why. The debug log is incomplete or slightly broken, but from what I do see there is an oddity.
2020/11/13 10:44:00 http2: Framer 0xc000af61c0: read HEADERS flags=END_STREAM|END_HEADERS|PRIORITY stream=1511 len=21
2020/11/13 10:44:00 http2: Framer 0xc000af61c0: wrote PUSH_PROMISE flags=END_HEADERS stream=1511 len=293
2020/11/13 10:44:00 http2: Framer 0xc000af61c0: read PRIORITY stream=314 len=5
2020/11/13 10:44:00 http2: Framer 0xc000af61c0: wrote PUSH_PROMISE flags=END_HEADERS stream=1511 len=33
2020/11/13 10:44:00 http2: Framer 0xc000af61c0: wrote HEADERS flags=END_HEADERS stream=314 len=120
2020/11/13 10:44:00 http2: Framer 0xc000af61c0: wrote PUSH_PROMISE flags=END_HEADERS stream=1511 len=44
2020/11/13 10:44:00 http2: Framer 0xc000af61c0: wrote HEADERS flags=END_HEADERS stream=316 len=115
Notice the stream for the PUSH is 1511 but the above is the first time I see that Framer. And the rest are in the 300s. I don't exactly see how this happened. There are multiple PUSH_PROMISE all with the same stream id which is also odd.
I truncated log-file after each successful request until it was hanged. Maybe that is why the log-file was broken.
I attach here other log-file, which was made when I caught problem from the first time. This log-file was never truncated. http2.debug.log
@fraenkel, we prepared test application for problem reproducing. My apologies for so complicated app. We cannot extract some small piece of code, because we don't know where is the problem exactly. I sent credentials to michael.fraenkel@gmail.com.
To have more chances to catch the problem it should to remove proxy cache:
kill -SIGTERM <caddy process id>
rm -fR /home/user/caddy-cache
./caddy run&
To patch or debug server:
cd /home/user/go/src/github.com/caddyserver/caddy/cmd/caddy
CGO_ENABLED=0 go build
mv ./caddy ~/caddy
cd
sudo setcap 'cap_net_bind_service=+ep' ./caddy
kill -SIGTERM <caddy process id>
./caddy run&
@dtelyukh I did find a way to cause the hang locally, from my machine it would never happen.
for i in {1..1000}; do echo $i; nghttp https://cardonecapital.hc04.dorofeev.me/ -n; done
would eventually hang.
Once I attempted to compile a new Go, I could never get caddy to rebuild. I always ended up with
qtls init failure. Attempting to fix that, I didn't realize at the time that the src tree was a bit special so I can no longer make any progress since I cannot download your smart-cache module.
If you could fix the tree, at least next time I know to make a copy of the entire tree before doing anything. I am a bit concerned that a simple go mod tidy
prevented any further compilation of caddy.
Never mind, I got it working again....
So one thing I did verify is that using the latest golang/x/net/http2 code does not cause the hang I see with my simple testcase.
@fraenkel, what can I help?
You can see the one line change I made to caddyserver with a go mod tidy
. See if this new version hangs for you as well.
@fraenkel, this new version is never hangs for us. And also we noticed that app become faster. 👍 Thank you!
Full page load time (with all resources) is 2% less than with old http/2, and median absolute deviation is 1% less too. So it's both faster, and shows more stable performance.
@fraenkel, should I close this ticket?
yes, given there is a solution and this should be fixed in 1.16 although one should verify that is true.
@fraenkel, this bug is still exist. But we found more clear way to reproduce it.
An issue in Caddy's repository: https://github.com/caddyserver/caddy/issues/3896
It depends on the proxied website and caddy config, and some random factors, thus it occurs with different frequency on different hardware. The steps are:
127.0.0.1 terem-pro.localhost
git clone https://github.com/caddyserver/caddy.git
cd caddy/cmd/caddy && go build
sudo setcap 'cap_net_bind_service=+ep' ./caddy
Create Caddyfile with this content
https://terem-pro.localhost {
handle {
reverse_proxy https://www.terem-pro.ru {
header_up host {http.reverse_proxy.upstream.host}
}
push / {
/local/components/terem/catalog.list/templates/index.best.seller/style.css
/local/components/terem/new_services.content/templates/home.banner.lots/style.css
/local/components/terem/slider.blocks/templates/slider.useful/style.css
/local/components/terem/standard.blocks/templates/call.action.white/style.css
/local/components/terem/review.list/templates/carousel.home/style.css
/local/components/terem/standard.blocks/templates/promo.red.home/style.css
/local/components/terem/promotion.list/templates/home.slider/style.css
/local/components/terem/form.form/templates/template.pdf/style.css
/local/templates/terem/components/bitrix/menu/template.header.menu.top/desktop-menu.css
/local/components/terem/form.form/templates/template.taxi/style.css
/assets/resources/css/home.css
/local/templates/terem/components/bitrix/menu/template.header.menu-mobile/style_menu.css
/local/components/terem/catalog.type.list/templates/.default/style.css
/assets/resources/css/styles.css
/bitrix/cache/css/s1/terem/template_ad73b02503569e1113abf0b013fdbb28/template_ad73b02503569e1113abf0b013fdbb28_v1.css?16067202133580
/bitrix/cache/css/s1/terem/page_074396ca6d41424fe878cb365c109aa1/page_074396ca6d41424fe878cb365c109aa1_v1.css?160672023225970
}
}
}
./caddy run
I reproduced a hang but its using the bundle http2 stack.
goroutine 5252 [select, 3 minutes]:
net/http.(*http2serverConn).writeHeaders(0xc000582900, 0xc000feec60, 0xc0003c93b0, 0xa, 0x24a6800)
/snap/go/6745/src/net/http/h2_bundle.go:5753 +0x172
net/http.(*http2responseWriterState).writeChunk(0xc000be7200, 0xc000fc1000, 0xdf, 0x1000, 0xc000fcfdf0, 0x1, 0xc0009a8180)
/snap/go/6745/src/net/http/h2_bundle.go:6020 +0x3bf
Write me, please, to fix it should bundle a recent version of http2-library? Or does this library itself need to be fixed? Is there a some workaround? Should we use http2-library directly?
Looks like with this hack it never hangs
http2.ConfigureServer(s, nil)
With it bundled version is not use, doesn't it?
The solution is to use explicitly x/net/http2.
The problem is not solved neither by 1.16rc1, nor by 1.15.8. Easy steps for reproducing are here: https://github.com/caddyserver/caddy/issues/3896
WIthout explicit usage of /x/net/http2 http2.ConfigureServer(s, nil)
hangs still happen.
I was able to reproduce this with the reverse proxy Traefik, and can confirm that directly calling the /x/net/http2
ConfigureServer
method fixes this issue. I can also confirm that this is not fixed in the golang 1.16.0 stdlib, @fraenkel any idea when we can expect the fix to make its way there?
cc @toothrot @dmitshur re possible release issue
:wave: Any update on this issue? Any idea how and when this might be resolved?
I hope it will help: https://github.com/golang/go/issues/45435
I believe this is fixed with Go 1.17 considering that the bundled http2 library was updated to a snapshot of x/net/http2 from May of this year.
@dtelyukh Can you give 1.17 a try with your test case (without manually using x/net/http2)?
@ReillyBrogan , I tried 1.17.3 - the problem still exists. I think, 1.18 could help.
Potentially fixed with https://github.com/golang/go/issues/49921. It's part of the Go 1.17.6 release.
This is still an issue on Go 1.18
I had the same problem using Go 1.19.
any progress now?i had the same problems with golang.org/x/net v0.0.0-20210813160813-60bc85c4be6d in branch internal-branch.go1.18-vendor
I had the same problem using Go 1.22.6
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
So, let's begin
High-Level Problem Description I'm creating reverse proxy with Caddy Server and my own plugin. I use http/2 and Server Push. Sometimes requests hang forever. Here is screenshot from Chrome DevTools:
Low-Level Problem Description So, I started to debug this situation. I found that my code execution stuck at
(http.responseWriter).Write()
, which is an instance of http2responseWriter. With help ofpprof
I found that lockup happens in two functions: http2serverConn.writeHeaders and http2serverConn.writeDataFromHandler - endless waiting of data fromdone
channel. Here is an illustration frompprof
:Next I built
go
from source with adding some debug messages and start to dive deeper. I found a problem with frames are sent to output. At this lineN
frames were pushed: https://github.com/golang/go/blob/go1.15.4/src/net/http/h2_bundle.go#L4692. Afterpush
-functionscheduleFrameWrite
-function is called. I watched into it and found that it often exit here: https://github.com/golang/go/blob/go1.15.4/src/net/http/h2_bundle.go#L4817. And onlyM
(M
<N
) frames were popped from queue here: https://github.com/golang/go/blob/go1.15.4/src/net/http/h2_bundle.go#L4837Pushed Frames
Popped Frames
What did you expect to see?
No lockups.
What did you see instead?
Random lockups.