banesullivan / scooby

🐶 🕵️ Great Dane turned Python environment detective
MIT License
47 stars 12 forks source link

Add import patcher to track all imported packages #38

Closed banesullivan closed 4 years ago

banesullivan commented 5 years ago

Here is a neat way to easily generate a report at the end of a notebook that has imported many packages without having to list all the packages when creating a report:

# aliased imports where we want the full package name
import numpy as np
import matplotlib.pyplot as plt
import pyvista as pv
import pandas as pd

# Many imports from a single package to track
from SimPEG import (
    Mesh, Maps, Utils, DataMisfit, Regularization,
    Optimization, Inversion, InvProblem, Directives
)
import SimPEG.EM as EM

# some standard libs that we don't want in the report
import inspect
import sys
import os

# import functions/classes from packages we need to track
from pymatsolver import PardisoSolver

# Note that this will not work for primitive types imported
#  scipy is not included in the report which could be an issue
from scipy.constants import mu_0 # a float value

# random function/class we don't care about
def foo():
    pass

class goo():
    pass

import scooby
scooby.Inspection(globals(), sort=True)
Screen Shot 2019-11-11 at 10 22 23 PM

jupyterthemes is an extra in there because my Jupyter kernels load that at start up

banesullivan commented 5 years ago

Another example showing how convenient this is!

2019-11-11 23 10 50

prisae commented 5 years ago

Looks neat, although I think it is potentially dangerous, or going into the wrong direction.

I've seen too many notebooks that load all sort of unnecessary stuff. Probably because things were copied over from other notebooks, or because they were used once and not any longer, etc. So this will then report many things which are not necessary. This is not the fault of Scooby, but still. I think the "proper" approach would still be to maintain your notebook nicely, and list the dependencies manually in scooby.Report().

prisae commented 5 years ago

A user has to provide globals() as a first input variable. Is there no way to have that being the default, scooby simply catching the globals?

banesullivan commented 5 years ago

A user has to provide globals() as a first input variable. Is there no way to have that being the default, scooby simply catching the globals?

I don't think that's possible - the globals() available to classes with scooby are different than the globals available in the notebook

I've seen too many notebooks that load all sort of unnecessary stuff. Probably because things were copied over from other notebooks, or because they were used once and not any longer, etc. So this will then report many things which are not necessary. This is not the fault of Scooby, but still. I think the "proper" approach would still be to maintain your notebook nicely, and list the dependencies manually in scooby.Report().

I mostly agree here and would argue that having a bunch of unused imports is simply a user error (to be fair, I've done this many, many times). Reporting on such is out of scope for scooby (IMO) and something more in line for a code linter (which I don't think any exist for Jupyter notebooks???)

I agree that it is best to be conscious of what packages your code uses and prefer to explicitly pass them to Report when publishing a notebook/code. However, I view this as a quick/easy environment reporting tool when debugging. I often find myself working on a project with lots of packages and needing to know what versions are there before publishing the notebook (when things are breaking/not working) - this provides a quick way to see every module I'm using - I could then have everything nicely listed and realize I have packages imported that I am not using and remove them (addressing the initial concern)


At the moment, this code still feels "hacky" to me... I'd prefer to have a way to view every loaded module in the kernel without having the user pass globals(). Also imports of primitive types from packages (the scipy example above) should show up - so I may want to find a way to do that before merging...

banesullivan commented 5 years ago

So actually, I just completely reimplemented this as having a way to override the builtins.__import__ method to track all imported packages. This is what I was originally trying to do so that ALL packages used can have their version reported. For example, importing a constant from scipy should report on scipy and numpy as those are the only two packages executed in the kernel.

import scooby
scooby.track_imports()

from scipy.constants import mu_0 # a float value

scooby.TrackedReport()
Screen Shot 2019-11-12 at 5 18 45 PM

This is intended to be used as a debugging tool such that all non-standard lib packages that are actively being used are reported - this is much like a pip freeze except that it is only reporting on packages that have been loaded at some point during the session.

banesullivan commented 5 years ago

And for this code, it produces the following report:

import scooby
scooby.track_imports()

# aliased imports where we want the full package name
import numpy as np
import matplotlib.pyplot as plt
import pyvista as pv
import pandas as pd

# Many imports from a single package to track
from SimPEG import (
    Mesh, Maps, Utils, DataMisfit, Regularization,
    Optimization, Inversion, InvProblem, Directives
)
import SimPEG.EM as EM

# some standard libs that we don't want in the report
import inspect
import sys
import os

# import functions/classes from packages we need to track
from pymatsolver import PardisoSolver

# Note that this will not work for primitive types imported
#  scipy is not included in the report which could be an issue
from scipy.constants import mu_0 # a float value

# random function/class we don't care about
def foo():
    pass

class goo():
    pass

import scooby
scooby.TrackedReport(sort=True)
Screen Shot 2019-11-12 at 5 35 39 PM
prisae commented 5 years ago

This actually uses really useful, for debugging and everything! Great stuff!

banesullivan commented 5 years ago

This actually uses really useful, for debugging and everything! Great stuff!

Glad you see this as more useful now - especially for debugging!