getsentry / raven-python

Raven is the legacy Python client for Sentry (getsentry.com) — replaced by sentry-python
https://sentry.io
BSD 3-Clause "New" or "Revised" License
1.68k stars 657 forks source link

Lazily import pkg_resources for better import performance. #1234

Closed MHendricks closed 6 years ago

MHendricks commented 6 years ago

Initial import performance is a consideration for our application of raven. We found that we were able more than halve the import time by moving the import of the pkg_resources module into the functions that need them.

I believe my changes are covered by existing tests.

ashwoods commented 6 years ago

Thx!

TheKevJames commented 6 years ago

@MHendricks @ashwoods we've noticed a bit of slowdown in the raven library since updating to 6.8.0 -- reading through the changelog since 6.7, this seems to be the only change which would affect us. If I'm reading this correctly, it looks like import pkg_resources is now called on each message sent to Sentry.

I wonder if it might both solve @MHendricks 's use-case and avoid repeatedly reimporting the package if we use a singleton-ish lazy load system. Something like:

if pkg_resources is None:
    import pkg_resources

Or, even better, maybe we should use a lazy import library. Result from Google, I have not used this.

MHendricks commented 6 years ago

Python is pretty good with handling reimporting a module. After the first call its essentially just the cost of a dictionary lookup. While I try to avoid importing in a function, it takes a lot of import calls to add up. On my system it took 10,000 calls for time.time() to even register that it took time.

import time
def test(count):
    t = time.time()
    for i in range(count):
        import pkg_resources
    return time.time() - t
>>> t = time.time(); import pkg_resources; print time.time() - t # First import
0.550999879837
>>> test(1000)
0.0
>>> test(10000)
0.00899982452393
>>> test(10000)
0.0090000629425
>>> test(10000)
0.0090000629425
>>> test(10000)
0.0090000629425

As for the change I made, I couldn't get raven to even try loading pkg_resources when using raven.Client.captureException and raven.conf.setup_logging. Do you know if that code is being called? If so, can you show me how to replicate it? I'm somewhat new to raven.

The lazy import library seems useful, but is probably overkill for ravens needs. pkg_resources is only used in two locations and I think a simple try\except could handle that. Something along the lines of this, but I would need to test it first:

pkg_resources = None
def get_version_from_app(module_name, app):
    global pkg_resources
    ...
    try:
        return pkg_resources.get_distribution(module_name).version
    except AttributeError: # Should only get triggered the first time this function is called
        try:
            import pkg_resources # Import and assign the global variable to the module.
            return pkg_resources.get_distribution(module_name).version
        except ImportError:
            ...
TheKevJames commented 6 years ago

TIL about re-importing, thanks!

I did a few tests a bit more specific to the pkg_resources reimport -- looks like you're right, those calls don't seem to be the determining factor here. I'll keep digging to see what our issue really is...