J535D165 / pyalex

A Python library for OpenAlex (openalex.org)
MIT License
158 stars 18 forks source link

Connection Pooling #47

Open ainamdar-ag opened 2 months ago

ainamdar-ag commented 2 months ago

Question regarding the requests session. It seems that there is a new session created for each new request based on https://github.com/J535D165/pyalex/blob/main/pyalex/api.py#L96

It might be better that this session is generated once per entity (to keep it simple) otherwise ideally a singleton session might be nice.

Why am I asking? It seems from profiling the python code, a lot of CPU time is spent on SSL/TLS connection handshake and one suggestion on the web is to use connection pool or session with requests lib.

J535D165 commented 2 months ago

Thanks for the report. I agree that we need to resolve this.

Would you do this on module level or class (e.g. Works()) level?

ainamdar-ag commented 2 months ago

I would implement it at Base class level, mainly to benefit cursors and repeated filters. Module level will be nice to have but not most effective, it might even add more complexity.

I'm not sure of the most common usage pattern but I imagine the classes are instantiated once only or few times.

My use case is mostly just paging through Works() with a specific filter and then using Authors() & Institutions() to fetch a list of the entities specified from Works list.

In addition, I think a way to disable SSL Verification for location execution would also help since many organisations have VPNs or proxies that use self-signed certificates. Maybe just pass-through verify=False to session.