Closed aimalygin closed 8 years ago
Thanks for reporting this issue. I haven't run any benchmarks or profiling yet. Network latency is probably the bulk of it, which I don't have much control over, but it would be good to confirm my suspicion.
How long does it take to send the same notification to 50,000 tokens?
I did not wait. About 30 seconds dispatched 2000 tokens. In my DB 67000 tokens.
Ouch. That's not good. Sorry I haven't had a chance to look into this yet.
What I would probably do is run it with an environment variable set. GODEBUG=http2debug=1 or GODEBUG=http2debug=2 for even more. That should give some insight into what's happening in the net/http2 layer.
Also, just to eliminate it as a factor, is your internet connectivity good? http://www.speedtest.net/ I get 24 Mbps down and 66 Mbps up from here.
I run it on the VPS from 123systems. Connection speed to the internet there is good. There I have now works timehop very quickly. 67000 sends tokens for 5-6 seconds.
That's much faster. Are you still using Buford? If so, what did you change to speed things up?
@Valrox So you switched to https://github.com/timehop/apns and it runs quickly?
I don't have a bunch of device tokens to test with, but I'm curious how long it would take to hit https://http2.golang.org/reqinfo a few thousands times.
So, Timehop has its own push lib. I see. I'll just see myself out.
Yah, I used Timehop's push lib prior to writing Buford. It is based on Apple's old TCP API.
Unfortunately I don't have Buford in production yet, so I haven't run into issues like @Valrox mentioned. Hopefully I'll get some time to investigate this soon.
So I use timehop. I've used it before buford. In timehop have a bug that I could not resolve. Sometimes one token comes several notifications at once. So I decided to try a new API on the Apple, just buford implements it. But here I was faced with the problem of speed.
Thanks for reporting the issue. My work on Buford is sponsored by RnP, but I'm on other projects right now. Hopefully i'll have some time to investigate soon, or if anyone else wants to look into this issue, please do.
It seems that you create a connection at each push request: https://github.com/RobotsAndPencils/buford/blob/master/push/service.go#L164 You could get benefit of http/2 multiplexing by keeping alive one connection and pipelining push messages.
Hi Alexis. I did some testing with Wireshark a few weeks ago and was under the impression that net/http was managing the multiplexing internally. At least it looks very different over the wire when using the same client (as it is) vs. creating new http clients for each request.
In light of this issue, it certainly warrants some followup.
Ok I'm going to investigate on my side and see if I can try to bring some enlightenment too.
Thanks.
It appears that new connections are being made, but not on every request.
I'm seeing this message about 5 times in the course of 20 requests:
http2: no cached connection was available http2: Transport creating client conn to 130.211.116.44:443
GODEBUG=http2debug=1 go test -bench .
package main
import (
"net/http"
"testing"
"golang.org/x/net/http2"
)
const proto = "HTTP/2.0"
func BenchmarkReqInfo(b *testing.B) {
transport := &http.Transport{}
if err := http2.ConfigureTransport(transport); err != nil {
b.Fatal(err)
}
client := &http.Client{Transport: transport}
b.ResetTimer()
for i := 0; i < b.N; i++ {
fmt.Println("\n\nGET https://http2.golang.org/reqinfo\n\n")
res, err := client.Get("https://http2.golang.org/reqinfo")
if err != nil {
b.Fatal(err)
}
res.Body.Close()
if res.Proto != proto {
b.Fatalf("Expected proto %s, got %s", proto, res.Proto)
}
}
}
This could be interesting
// TODO: add support for sharing conns based on cert names
https://github.com/golang/net/blob/master/http2/client_conn_pool.go#L26
I've tried ensuring that the Transport has everything a default transport does. Ive also tried always reading the response body like Buford does (could matter for http 1.0 but not for 2.0 afaik). Same issues.
I looked around in ClientConnPool and did some logging. It appears that the connection for the addr is being cleaned up and removed from the pool. I'm not sure what triggers the cleanup yet.
It’s quite possible that it’s Go doing something with the connection pool. I would trying and get a packet capture and see what’s actually happening from a TCP perspective. You hopefully will be able to tell which end is initiating the disconnect. - @mbrevoort
It's a good point. I'm not entirely sure if Apple or other servers are dropping the connection or if it's something on the client side.
There is another HTTP/2 client for Go (#23). I sent an email asking if they had experienced this issue. This is what Adam Jones had to say:
We are getting 8,000-11,000 pushes per second with HTTP/2 so far, but need to init the client and then push inside either a worker/channel or a goroutine. Crude example https://gist.github.com/sideshow/397f2d729fcf483ec304#file-example-go-L33 - @sideshow
I need to experiment with this.
@bradfitz says:
It stops caching the connection if the other side tells you to GOAWAY or closes the connection. https://groups.google.com/forum/#!msg/golang-nuts/vF89LqoThDs/oujFln9rAwAJ
So I need to confirm that we are getting a GOAWAY or closed connection, and then the question becomes why?
For now I'm going to try the reqinfo test inside a goroutine based on @sideshow's suggestion. If I can get a bunch of device tokens, I can do a proper test hitting Apple's servers (I should be able to at work).
Use Go's http2 debugging to watch what's happening on a live client or server. See the net/http package docs.
Hi @bradfitz. Right now I don't have a live environment to test this in, so I've resorted to simple tests like: https://github.com/RobotsAndPencils/buford/issues/31#issuecomment-191983295
@Valrox can you try your push with GODEBUG=http2debug=1
and report back? (see https://golang.org/pkg/net/http/)
In summary:
It sounds like the approach @sideshow took when using his https://github.com/sideshow/apns2 library is to fire off multiple requests in separate goroutines. If that's the best solution, perhaps a worker pool could be added to the Buford example as a suggestion.
The way @Valrox and I have been using Buford is to sequentially send each request. The old TCP-based API used by https://github.com/timehop/apns only has a single goroutine sending requests off to Apple and managed 67000 tokens in 5-6 seconds (as per https://github.com/RobotsAndPencils/buford/issues/31#issuecomment-188258525), but then those are via a single connection and never waited to be acknowledged. There's obviously going to be some overhead with HTTP/2.
What concerns me is that while investigating this issue with a more minimal test, the connection to https://http2.golang.org/reqinfo had to be reestablished more often than I would have expected. Though I have yet to confirm if the same behaviour occurs sending many requests to Apple's servers.
From the http2debug logs using https://github.com/RobotsAndPencils/buford/issues/31#issuecomment-191983295, it's not entirely clear to me when the connection is closed, but it would seem that every few requests the connection closes down as the message "no cached connection was available" appears and the next request is back on stream 1.
In #49 there is a benchmark that pushes notifications to Apple using a single device token.
Using a command like this against the Apple sandbox is resulting in 75-100ms per notification from my MacBook:
❯ GODEBUG=http2debug=1 go test ./push -cert ../cert.p12 -token device-token -v -bench . -benchtime 30s
500 96173515 ns/op
Across the 500 notifications sent, I'm seeing it establish a connection ~3 times.
Transport failed to get client conn for api.development.push.apple.com:443: http2: no cached connection was available
I'm not sure exactly why it's taking more than one connection, but I don't see any GOAWAYs. It's opening a new connection relatively few times though.
As far as speed, it's possible that Apple is limiting us somehow, whether throttling, or only providing a few concurrent streams per connection.
The APNs server allows multiple concurrent streams for each connection. The exact number of streams is based on server load, so do not assume a specific number of streams. Connections
So far I haven't done any profiling or determined whether the time is taken up on the client, the server, or network latency. But I'm seeing a similar ~77ms per request on a benchmark against https://http2.golang.org/reqinfo (https://github.com/RobotsAndPencils/buford/issues/31#issuecomment-191983295).
Even at an best time I've seen of 75ms, it would take over one hour to send 50,000 notifications over a single connection. Of course multiple connections can be opened to Apple.
In terms of the connection needing to be established multiple times, I did more experimentation with my reqinfo benchmark now that Go 1.6.2 upgrades clients to HTTP/2 properly.
package main
import (
"io/ioutil"
"net/http"
"testing"
)
const proto = "HTTP/2.0"
func BenchmarkReqInfo(b *testing.B) {
for i := 0; i < b.N; i++ {
res, err := http.Get("https://http2.golang.org/reqinfo")
if err != nil {
b.Fatal(err)
}
_, err = ioutil.ReadAll(res.Body)
if err != nil {
b.Fatal(err)
}
res.Body.Close()
if res.Proto != proto {
b.Fatalf("Expected proto %s, got %s", proto, res.Proto)
}
}
}
When I create the Transport myself and use http2.ConfigureTransport, it was establishing the connection multiple times, whereas a single connection was enough when using http.Get
with the DefaultTransport. Perhaps there is something wrong with how I am doing the set up at: https://github.com/RobotsAndPencils/buford/blob/master/push/service.go#L52
More experiments to try:
Running from Nitrous US West to Apple dev environment with -benchtime 10s
.
BenchmarkPush 500 32169893 ns/op ok github.com/RobotsAndPencils/buford/push 21.455s
32ms/notification
That's ~27 minutes for 50,000 notifications on a single connection. Make 20-30 connections to Apple and it's not so bad (the question is, how many connections should we be opening).
I did a test with Pigeon (Elixir) that was about 200ms/notification (timed with stopwatch) from my laptop.
Right now my examples of using Buford all wait for the response before sending another notification. As @sideshow pointed out some time ago, we need to send the notifications with a bunch of goroutines. For now I'll prepare an example on concurrent use and update the README.
After that's done, I'll be closing this issue.
Sending 1000 notifications using 20 workers with the example at #54 results in:
5.5 ms/notification from my laptop 2.5 ms/notification from Nitrous.io US West
2.5 ms/notification = 24,000 notifications per minute which IMO is pretty good. It may go faster still with more workers or more than a single HTTP/2 connection.
Closing this issue.
for i := 0; i < len(js.Tokens); i++ {
tok := js.Tokens[i]
if tok != "" {
id, err := service.Push(token, nil, p)
}
}
Slow sending. What can you think up to speed sending?