geeknam / python-gcm

Python client for Google Cloud Messaging for Android (GCM)
MIT License
458 stars 145 forks source link

python-gcm does not support connection pooling #89

Closed alexdej closed 8 years ago

alexdej commented 9 years ago

Hello! I observe that an SSL connection is being created for each request. This was true at least in 0.1.5 and was still true in 0.2. This appears to be a consequence of using requests.post which creates and closes an internal Session object. In Requests, connection pooling is bound to the Session object (http://docs.python-requests.org/en/latest/user/advanced/#keep-alive).

I tested a monkey patched version of python-gcm where gcm.gcm.requests is replaced with a gcm.gcm.requests.Session() object and observed that connections are reused. From my test server in AWS us-east-1, each GCM POST request took avg ~140ms without the patch and avg ~40ms with (7 qps vs 25) which at our volume is a significant help.

It wasn't clear to me how best to integrate connection pooling into python-gcm so I thought I'd post the issue for comment before sending a PR. Also, the thread safety of requests.Session is an open question so that's a trade-off to consider: https://github.com/kennethreitz/requests/issues/1871.

alibitek commented 9 years ago

@alexdej Hi! Thanks for raising this issue! Indeed this is a performance problem and we should reuse the underlying TCP connection/socket.

Currently, using json_request you can send push notifications in bulks of 1000, so if you have 1 million tokens you would open 1000 TCP connections to the GCM server.

I've just tested with a few million notifications something along the lines of: https://github.com/mnemonicflow/python-gcm/commit/00a93d69288bc786e30d43df89ef271ac54117ce which should be a good starting point, although this means forcing the clients of the library to use a context manager and existing clients need to update their code.

with GCM(API_KEY) as gcm:
    response = gcm.json_request(registration_ids=registration_ids, data=notification,
                                collapse_key='xyz',
                                priority='high',
                                restricted_package_name="com.mycompany.myawesomeapp,
                                delay_while_idle=True,
                                dry_run=False)

But I think would be better to add another level of indirection and run the cleanup code inside the GCM object by creating a static session object wrapper using the https://docs.python.org/3/library/contextlib.html#contextlib.contextmanager decorator or similar.

Regarding thread safety, I think it should be handled by the client of the library and maybe add an option to the GCM object to specify if you want connection reuse or not.

alexdej commented 9 years ago

Great! Your suggestion would work fine for our case, though I agree you might want to make the new behavior optional to avoid changing existing clients (and to preserve thread safety of gcm library by default).

tgwizard commented 8 years ago

:+1: on reusing connections. @mnemonicflow your suggestion only works if the data is the same for all recipients, right? If the data is unique for everyone you'd have to call gcm.json_request() once per recipient.

alibitek commented 8 years ago

@tgwizard Yes, the data is the same for all recipients you pass in the registration_ids parameter. If you want to send different data for different recipients you have to group the recipients (preferrably in bulks of 1000) and pass the specific data you want to send to them in a different json_request call. The underlying TCP connection is still reused due to the session object but as the documentation says http://docs.python-requests.org/en/latest/user/advanced/ the data is NOT. "Note, however, that method-level parameters will not be persisted across requests, even if using a session." Method level parameters refers to the parameters of the .post, .get, .put, etc. methods of the requests.Session() object

tgwizard commented 8 years ago

Thanks for the response @mnemonicflow. The session, and TCP connection, reuse will only be enabled when #96 being merged, as now there's a call to requests.post, which creates new sessions for every call. Or am I missing something?

alibitek commented 8 years ago

@tgwizard Yes! #96 got merged in develop branch https://github.com/geeknam/python-gcm/tree/develop

tgwizard commented 8 years ago

Cool, thank you!