geduldig / TwitterAPI

Minimal python wrapper for Twitter's REST and Streaming APIs
936 stars 263 forks source link

Session Objects not getting reused #80

Closed nesdis closed 7 years ago

nesdis commented 7 years ago

In the TwitterAPI class, session is not getting created during init, Instead, every new request is creating a NEW session object. This is wasteful and slow. New requests should reuse old session objects.

TwitterAPI class doesn't provide a, exit method which will close the session object.

geduldig commented 7 years ago

I tried creating the session in init and I don't see any improvement in speed or memory consumption. Can you provide short example that shows the issues you describe?

Also, what is the use case for controlling when the session closes with an exit method?

In the current implementation of TwitterAPI.request() my understanding is that when the method finishes the local session variable gets released by garbage collection (sometime later) and the session gets closed.

My rationale for making session local to TwitterAPI.request() was to make the method safe to call inside a thread. I want to call the method and not overwrite the session variables used in another thread instance.

nesdis commented 7 years ago

Session Objects The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance, and will use urllib3's connection pooling. So if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).

http://docs.python-requests.org/en/master/user/advanced/#session-objects

nesdis commented 7 years ago

why do you suggest using session in init is not thread safe?

geduldig commented 7 years ago

I understand the rationale for the proposed changes. I just need to make sure that they do not break existing code that uses TwitterAPI. I also want to verify that the changes do in fact improve TwitterAPI.

With that, although it makes perfect sense to reuse the session object, I was unable to see a performance boost. My test was to create 200 requests in 200 threads using a single session or 200 sessions. I didn't see a memory boost either. That is why I would be interested if you could produce a simple test to verify the speed and memory improvements that should come with reusing the session object.

But, even without a satisfactory verification, reusing the session object makes good design sense and had been in my mind, except for thread safety.

My concern with thread safety is that every time TwitterAPI.request() is called the session properties are set. So, if I share the same session object in multiple threads, every time I set a session property it will be set for all threads. Now, this is very unlikely a problem -- I can see this occurring only if one thread requested a streaming endpoint and another thread requested a REST endpoint -- still I don't want stipulate this condition to the user.

Lastly, when would I want to manually close session objects? They seem to close fine on their own.

nesdis commented 7 years ago

Hi

You would like to close the session automatically. When using your Twitter api object inside a 'with' statement. In this case sessions get closed inside the exit method. Session objects get closed only during python gc. Which you have no control.

By not reusing sessions. Every new request will tear down n reconstruct new sockets. Sending requests to same host using existing session will reuse the previous socket in the connection pool.

Different threads can use different Twitter api objects for separate configurations

On 12 Dec 2016 17:58, "geduldig" notifications@github.com wrote:

I understand the rationale for the proposed changes. I just need to make sure that they do not break existing code that uses TwitterAPI. I also want to verify that the changes do in fact improve TwitterAPI.

With that, although it makes perfect sense to reuse the session object, I was unable to see a performance boost. My test was to create 200 requests in 200 threads using a single session or 200 sessions. I didn't see a memory boost either. That is why I would be interested if you could produce a simple test to verify the speed and memory improvements that should come with reusing the session object.

But, even without a satisfactory verification, reusing the session object makes good design sense and had been in my mind, except for thread safety.

My concern with thread safety is that every time TwitterAPI.request() is called the session properties are set. So, if I share the same session object in multiple threads, every time I set a session property it will be set for all threads. Now, this is very unlikely a problem -- I can see this occurring only if one thread requested a streaming endpoint and another thread requested a REST endpoint -- still I don't want stipulate this condition to the user.

Lastly, when would I want to manually close session objects? They seem to close fine on their own.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/geduldig/TwitterAPI/issues/80#issuecomment-266419587, or mute the thread https://github.com/notifications/unsubscribe-auth/AQO4BJ3kqACJ8dgizdQ53J6jb87fAYIlks5rHT3QgaJpZM4LJ2FT .

geduldig commented 7 years ago

If I keep the session object local to TwitterAPI.request(), I agree I should create the object inside a with block.

I also agree that using one session object is preferable in order to reuse sockets in the connection pool. However, if I switch to a single shared session object, I will need to document the conditions where it is not thread-safe. That's a pain, but justifiable. My question to you is what is really gained. I understand there is overhead, but is it significant. Can you measure it?

nesdis commented 7 years ago

No I presume the overhead is not significant. Maybe be significant for multiple REST api access. But not much.

On 12 Dec 2016 23:34, "geduldig" notifications@github.com wrote:

If I keep the session object local to TwitterAPI.request(), I agree I should create the object inside a with block.

I also agree that using one session object is preferable in order to reuse sockets in the connection pool. However, if I switch to a single shared session object, I will need to document the conditions where it is not thread-safe. That's a pain, but justifiable. My question to you is what is really gained. I understand there is overhead, but is it significant. Can you measure it?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/geduldig/TwitterAPI/issues/80#issuecomment-266504300, or mute the thread https://github.com/notifications/unsubscribe-auth/AQO4BMjnQKmOWQKJYwwrupII-rsugKEzks5rHYyVgaJpZM4LJ2FT .

geduldig commented 7 years ago

There's a new version 2.4.3 that has the local session (no change) but is now created in a with block. I think this makes the code clearer. There is no performance improvement. Neither did I see a performance improvement by sharing a single session object. I tested this using 500 threads and authenticating using a whitelisted account which gave me much higher rate limits than a normal user would have.