Stream forcefully closed

omidontop commented 6 years ago

Thank you for the great work, you saved me a ton of time! I have encountered an issue and sometimes I randomly get a Stream forcefully closed exception. This is definitely not because of a certificate issue as it works the majority of times. Here's the debug trace leading to the exception:

[2018-07-09 01:00:43 PM] [hpack.hpack              ] [DEBUG] HPACK encoding <generator object inner at 0x7f14f75b01f0>
[2018-07-09 01:00:43 PM] [hpack.hpack              ] [DEBUG] Adding (':method', 'POST') to the header table
[2018-07-09 01:00:43 PM] [hpack.hpack              ] [DEBUG] Encoding 3 with 7 bits
[2018-07-09 01:00:43 PM] [hpack.hpack              ] [DEBUG] Adding (':scheme', 'https') to the header table
[2018-07-09 01:00:43 PM] [hpack.hpack              ] [DEBUG] Encoding 7 with 7 bits
[2018-07-09 01:00:43 PM] [hpack.hpack              ] [DEBUG] Adding (':authority', 'api.development.push.apple.com') to the header table
[2018-07-09 01:00:43 PM] [hpack.hpack              ] [DEBUG] Encoding 64 with 7 bits
[2018-07-09 01:00:43 PM] [hpack.hpack              ] [DEBUG] Adding (':path', '/3/device/************************* [HIDDEN] ***************************') to the header table
[2018-07-09 01:00:43 PM] [hpack.hpack              ] [DEBUG] Encoding 63 with 7 bits
[2018-07-09 01:00:43 PM] [hpack.hpack              ] [DEBUG] Adding ('apns-topic', 'com.organization.Application') to the header table
[2018-07-09 01:00:43 PM] [hpack.hpack              ] [DEBUG] Encoding 62 with 7 bits
[2018-07-09 01:00:43 PM] [hpack.hpack              ] [DEBUG] Encoded header block to �����
[2018-07-09 01:00:43 PM] [hyper.http20.connection  ] [DEBUG] recv for stream 3 with set([]) already present
[2018-07-09 01:00:43 PM] [hyper.http20.connection  ] [DEBUG] Close stream 3
[2018-07-09 01:00:43 PM] [hyper.http20.connection  ] [DEBUG] recv for stream 3 with set([]) already present
[2018-07-09 01:00:43 PM] [root                     ] [ERROR] Stream forcefully closed

Any ideas where the problem might be?

MichaelDanielTom commented 6 years ago

I'm getting the same error, it seems to happen randomly in production for us. The errors all seem to be clustered around the same time, even though multiple processes are running on multiple machines, so I feel like it might be something to do with Apple, but I'm not entirely sure.

omidontop commented 6 years ago

I can imagine that the server would sometimes close a stream or connection but the stack should be able to detect that and reopen the connection if needed or raise an Exception and let the application handle it. This exception is raised from a lower layer though.

cridenour commented 6 years ago

In our case, we were seeing this in production when using a long lived client object. By initiating the client in a heartbeat celery task, we have seemed to avoid this for a couple days. It also fixed #67 for us as well.

omidontop commented 6 years ago

@cridenour That was an interesting idea. I wasn't familiar with the celery platform. Can you elaborate a bit? So do you create a new instance with every heartbeat call (which as I read is called every minute?)? And how does it all fit together?

MichaelDanielTom commented 6 years ago

Update: We started batching our pushes using send_notification_batch and we haven't gotten this error since. In our previous setup we were sending 500+ notifications serially using the same APNS connection, which would take many minutes. The new setup takes significantly less time as well. This leads me to believe that the problem is related to long lived client objects as @cridenour mentioned.

cridenour commented 6 years ago

@omidontop Yea we changed our system to check each minute if a notification should go out (ours are time based, but could easily be queued in a db if not) and at that moment, create the client, use send_notification_batch as @MichaelDanielTom suggested (though we were using this before, but the client was shared), and then let it close itself when it falls out of memory. Though you might be able to more explicitly close the connection as well.

omidontop commented 6 years ago

@MichaelDanielTom Apart from the benefit of using the send_notification_batch over the send_notification for sending many notifications at once, my guess as to the reason it works better is that it calls self.connect() before attempting to send. I have tried adding a connect() call before the send_notification and I will see how that goes.

omidontop commented 6 years ago

@cridenour Thanks for sharing your solution. The main reason I switched to the new interface of sending the notifications was to keep a persistent connection to the APNs. The problem I see with your approach is that it probably causes successive connect-disconnects to the server and Apple's servers sometimes interpret that as a DoS attack and might block your further requests. This is mentioned here:

Another possibility is that you've connected too many times to APNs and further connections have been temporarily blocked. As is documented in the Local and Remote Notification Programming Guide, developers are expected to open a connection and leave it open. If a connection is opened and closed repeatedly, APNs will treat it as a denial of service attack and block connections for a period of time.

cridenour commented 6 years ago

@omidontop Interesting! I imagine a one minute buffer should be enough (vs thousands of connections, one per notification) but if not we will adjust to a more intelligent connection buffer. Thanks for the heads up.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Pr0Ger / PyAPNs2

Stream forcefully closed #66