RobotsAndPencils / buford

A push notification delivery engine for the new HTTP/2 APNS service.
MIT License
475 stars 52 forks source link

Slow sending 50000+ push #31

Closed aimalygin closed 8 years ago

aimalygin commented 8 years ago

for i := 0; i < len(js.Tokens); i++ { tok := js.Tokens[i] if tok != "" { id, err := service.Push(token, nil, p) } } Slow sending. What can you think up to speed sending?

nathany commented 8 years ago

Thanks for reporting this issue. I haven't run any benchmarks or profiling yet. Network latency is probably the bulk of it, which I don't have much control over, but it would be good to confirm my suspicion.

How long does it take to send the same notification to 50,000 tokens?

aimalygin commented 8 years ago

I did not wait. About 30 seconds dispatched 2000 tokens. In my DB 67000 tokens.

nathany commented 8 years ago

Ouch. That's not good. Sorry I haven't had a chance to look into this yet.

What I would probably do is run it with an environment variable set. GODEBUG=http2debug=1 or GODEBUG=http2debug=2 for even more. That should give some insight into what's happening in the net/http2 layer.

Also, just to eliminate it as a factor, is your internet connectivity good? http://www.speedtest.net/ I get 24 Mbps down and 66 Mbps up from here.

aimalygin commented 8 years ago

I run it on the VPS from 123systems. Connection speed to the internet there is good. There I have now works timehop very quickly. 67000 sends tokens for 5-6 seconds.

bfitzsimmons commented 8 years ago

That's much faster. Are you still using Buford? If so, what did you change to speed things up?

nathany commented 8 years ago

@Valrox So you switched to https://github.com/timehop/apns and it runs quickly?

I don't have a bunch of device tokens to test with, but I'm curious how long it would take to hit https://http2.golang.org/reqinfo a few thousands times.

bfitzsimmons commented 8 years ago

So, Timehop has its own push lib. I see. I'll just see myself out.

nathany commented 8 years ago

Yah, I used Timehop's push lib prior to writing Buford. It is based on Apple's old TCP API.

Unfortunately I don't have Buford in production yet, so I haven't run into issues like @Valrox mentioned. Hopefully I'll get some time to investigate this soon.

aimalygin commented 8 years ago

So I use timehop. I've used it before buford. In timehop have a bug that I could not resolve. Sometimes one token comes several notifications at once. So I decided to try a new API on the Apple, just buford implements it. But here I was faced with the problem of speed.

nathany commented 8 years ago

Thanks for reporting the issue. My work on Buford is sponsored by RnP, but I'm on other projects right now. Hopefully i'll have some time to investigate soon, or if anyone else wants to look into this issue, please do.

aagahi commented 8 years ago

It seems that you create a connection at each push request: https://github.com/RobotsAndPencils/buford/blob/master/push/service.go#L164 You could get benefit of http/2 multiplexing by keeping alive one connection and pipelining push messages.

nathany commented 8 years ago

Hi Alexis. I did some testing with Wireshark a few weeks ago and was under the impression that net/http was managing the multiplexing internally. At least it looks very different over the wire when using the same client (as it is) vs. creating new http clients for each request.

In light of this issue, it certainly warrants some followup.

aagahi commented 8 years ago

Ok I'm going to investigate on my side and see if I can try to bring some enlightenment too.

nathany commented 8 years ago

Thanks.

nathany commented 8 years ago

It appears that new connections are being made, but not on every request.

I'm seeing this message about 5 times in the course of 20 requests:

http2: no cached connection was available http2: Transport creating client conn to 130.211.116.44:443

GODEBUG=http2debug=1 go test -bench .
package main

import (
    "net/http"
    "testing"

    "golang.org/x/net/http2"
)

const proto = "HTTP/2.0"

func BenchmarkReqInfo(b *testing.B) {
    transport := &http.Transport{}
    if err := http2.ConfigureTransport(transport); err != nil {
        b.Fatal(err)
    }
    client := &http.Client{Transport: transport}

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        fmt.Println("\n\nGET https://http2.golang.org/reqinfo\n\n")

        res, err := client.Get("https://http2.golang.org/reqinfo")
        if err != nil {
            b.Fatal(err)
        }
        res.Body.Close()
        if res.Proto != proto {
            b.Fatalf("Expected proto %s, got %s", proto, res.Proto)
        }
    }
}
nathany commented 8 years ago

This could be interesting

// TODO: add support for sharing conns based on cert names

https://github.com/golang/net/blob/master/http2/client_conn_pool.go#L26

nathany commented 8 years ago

I've tried ensuring that the Transport has everything a default transport does. Ive also tried always reading the response body like Buford does (could matter for http 1.0 but not for 2.0 afaik). Same issues.

nathany commented 8 years ago

I looked around in ClientConnPool and did some logging. It appears that the connection for the addr is being cleaned up and removed from the pool. I'm not sure what triggers the cleanup yet.

nathany commented 8 years ago

It’s quite possible that it’s Go doing something with the connection pool. I would trying and get a packet capture and see what’s actually happening from a TCP perspective. You hopefully will be able to tell which end is initiating the disconnect. - @mbrevoort

It's a good point. I'm not entirely sure if Apple or other servers are dropping the connection or if it's something on the client side.

nathany commented 8 years ago

There is another HTTP/2 client for Go (#23). I sent an email asking if they had experienced this issue. This is what Adam Jones had to say:

We are getting 8,000-11,000 pushes per second with HTTP/2 so far, but need to init the client and then push inside either a worker/channel or a goroutine. Crude example https://gist.github.com/sideshow/397f2d729fcf483ec304#file-example-go-L33 - @sideshow

I need to experiment with this.

nathany commented 8 years ago

@bradfitz says:

It stops caching the connection if the other side tells you to GOAWAY or closes the connection. https://groups.google.com/forum/#!msg/golang-nuts/vF89LqoThDs/oujFln9rAwAJ

So I need to confirm that we are getting a GOAWAY or closed connection, and then the question becomes why?

For now I'm going to try the reqinfo test inside a goroutine based on @sideshow's suggestion. If I can get a bunch of device tokens, I can do a proper test hitting Apple's servers (I should be able to at work).

bradfitz commented 8 years ago

Use Go's http2 debugging to watch what's happening on a live client or server. See the net/http package docs.

nathany commented 8 years ago

Hi @bradfitz. Right now I don't have a live environment to test this in, so I've resorted to simple tests like: https://github.com/RobotsAndPencils/buford/issues/31#issuecomment-191983295

@Valrox can you try your push with GODEBUG=http2debug=1 and report back? (see https://golang.org/pkg/net/http/)

In summary:

It sounds like the approach @sideshow took when using his https://github.com/sideshow/apns2 library is to fire off multiple requests in separate goroutines. If that's the best solution, perhaps a worker pool could be added to the Buford example as a suggestion.

The way @Valrox and I have been using Buford is to sequentially send each request. The old TCP-based API used by https://github.com/timehop/apns only has a single goroutine sending requests off to Apple and managed 67000 tokens in 5-6 seconds (as per https://github.com/RobotsAndPencils/buford/issues/31#issuecomment-188258525), but then those are via a single connection and never waited to be acknowledged. There's obviously going to be some overhead with HTTP/2.

What concerns me is that while investigating this issue with a more minimal test, the connection to https://http2.golang.org/reqinfo had to be reestablished more often than I would have expected. Though I have yet to confirm if the same behaviour occurs sending many requests to Apple's servers.

From the http2debug logs using https://github.com/RobotsAndPencils/buford/issues/31#issuecomment-191983295, it's not entirely clear to me when the connection is closed, but it would seem that every few requests the connection closes down as the message "no cached connection was available" appears and the next request is back on stream 1.

nathany commented 8 years ago

In #49 there is a benchmark that pushes notifications to Apple using a single device token.

Using a command like this against the Apple sandbox is resulting in 75-100ms per notification from my MacBook:

❯ GODEBUG=http2debug=1 go test ./push -cert ../cert.p12 -token device-token -v -bench . -benchtime 30s

     500      96173515 ns/op

Across the 500 notifications sent, I'm seeing it establish a connection ~3 times.

Transport failed to get client conn for api.development.push.apple.com:443: http2: no cached connection was available

I'm not sure exactly why it's taking more than one connection, but I don't see any GOAWAYs. It's opening a new connection relatively few times though.

As far as speed, it's possible that Apple is limiting us somehow, whether throttling, or only providing a few concurrent streams per connection.

The APNs server allows multiple concurrent streams for each connection. The exact number of streams is based on server load, so do not assume a specific number of streams. Connections

So far I haven't done any profiling or determined whether the time is taken up on the client, the server, or network latency. But I'm seeing a similar ~77ms per request on a benchmark against https://http2.golang.org/reqinfo (https://github.com/RobotsAndPencils/buford/issues/31#issuecomment-191983295).

Even at an best time I've seen of 75ms, it would take over one hour to send 50,000 notifications over a single connection. Of course multiple connections can be opened to Apple.

nathany commented 8 years ago

In terms of the connection needing to be established multiple times, I did more experimentation with my reqinfo benchmark now that Go 1.6.2 upgrades clients to HTTP/2 properly.

package main

import (
    "io/ioutil"
    "net/http"
    "testing"
)

const proto = "HTTP/2.0"

func BenchmarkReqInfo(b *testing.B) {
    for i := 0; i < b.N; i++ {
        res, err := http.Get("https://http2.golang.org/reqinfo")
        if err != nil {
            b.Fatal(err)
        }
        _, err = ioutil.ReadAll(res.Body)
        if err != nil {
            b.Fatal(err)
        }
        res.Body.Close()
        if res.Proto != proto {
            b.Fatalf("Expected proto %s, got %s", proto, res.Proto)
        }
    }
}

When I create the Transport myself and use http2.ConfigureTransport, it was establishing the connection multiple times, whereas a single connection was enough when using http.Get with the DefaultTransport. Perhaps there is something wrong with how I am doing the set up at: https://github.com/RobotsAndPencils/buford/blob/master/push/service.go#L52

nathany commented 8 years ago

More experiments to try:

nathany commented 8 years ago

Running from Nitrous US West to Apple dev environment with -benchtime 10s.

BenchmarkPush 500 32169893 ns/op ok github.com/RobotsAndPencils/buford/push 21.455s

32ms/notification

That's ~27 minutes for 50,000 notifications on a single connection. Make 20-30 connections to Apple and it's not so bad (the question is, how many connections should we be opening).

I did a test with Pigeon (Elixir) that was about 200ms/notification (timed with stopwatch) from my laptop.

nathany commented 8 years ago

Right now my examples of using Buford all wait for the response before sending another notification. As @sideshow pointed out some time ago, we need to send the notifications with a bunch of goroutines. For now I'll prepare an example on concurrent use and update the README.

After that's done, I'll be closing this issue.

nathany commented 8 years ago

Sending 1000 notifications using 20 workers with the example at #54 results in:

5.5 ms/notification from my laptop 2.5 ms/notification from Nitrous.io US West

2.5 ms/notification = 24,000 notifications per minute which IMO is pretty good. It may go faster still with more workers or more than a single HTTP/2 connection.

Closing this issue.