Technologicat / pyan

Static call graph generator. The official Python 3 version. Development repo.
GNU General Public License v2.0
329 stars 57 forks source link

Automatic search of files to read inside source code, perhaps? #15

Open g-queiroz opened 5 years ago

g-queiroz commented 5 years ago

Just to keep track of this feature request. :)

It would be nice if pyan could look inside a source code with many imports and decide which files it should also analyze. Python module finder could be useful. Of course, we would need to think about standard libs, which may or may not be interesting to have in the call graph. Hope it helps!

Technologicat commented 5 years ago

Thanks for filing the request to the issue tracker! Scroll to the end for current workarounds.

The suggested feature would allow specifying just one .py file for Pyan to act as the root for the analysis. It would then be automatically trawled for any imports, including the imported files in the analysis.

Essentially, to do this, we could walk the AST once in a separate pre-processing step to resolve any import nodes in it, recursing into any modules mentioned there, and use the result to build the list of files to analyze.

Due to the dynamic nature of Python, such a static analysis cannot be 100% accurate - it is possible to overwrite the same name later with another import, perform different imports in if/else or try/except branches, or even locally import inside a function. Not to mention the implications of import hooks - see below.

In cases where the same (call graph) node resolves to several different definitions at different places in the code, then - from the viewpoint of a static analysis - perhaps it is reasonable to always let the textually first/last one to win (which to prefer, maybe settable from the command line).

Pyan needs to know whether the program being analyzed is intended to be invoked as python3 somedir/subdir/myscript.py, cd somedir/subdir && python3 myscript.py or python3 -m somedir.subdir.myscript.

This obviously needs some sort of user interface. Perhaps we could imitate Python itself: pyan3 somedir/subdir/myscript.py vs. cd somedir/subdir && pyan3 myscript.py vs. pyan3 -m somedir.subdir.myscript could carry this distinction to Pyan.

We will also need some new command-line switches to tell Pyan where to recurse; for example, to leave out anything in the stdlib (this should probably be the default, but it would be useful to be able to switch on analysis of the stdlib, too), or to only recurse under either the CWD or the directory where the top-level script being analyzed resides. (Whether the CWD or the script directory is the relevant one, depends on how the program is intended to be invoked. Another good explanation of this particular point is in Hitchhiker's guide to Python imports.)

Finding the imported modules

We could use the finder part of Python's importer (≡ finder + loader) to make Python find the modules for us. This way we avoid duplicating the logic for locating a module, which in Python is rather complex. See #5. We would essentially run just the name-resolution part of the import, without actually running the target module.

But note, Python's import logic is programmable at runtime, so what happens when a module is actually imported can be customized (even completely overridden!) by libraries installed in the Python environment in which the program runs. Libraries using import hooks include at least MacroPy3 for its macro expander; and Pydialect for its dialect compiler.

Furthermore, it's not just the import logic that's programmable. Mypy used to have a codec hack, which is a deprecated (for good reason!) way to implement reader macros in Python, but that was thankfully gone already in 0.4.5 (2016). (For any readers who end up here searching for reader macros for Python, see source_transformer in Pydialect for the modern, PEP 302 compatible way to do that.)

A static analysis of code using such libraries won't capture any of the hooks. The hooks are normally installed dynamically, when the program actually runs (when the relevant library is imported and its startup code runs). Still, this might be good enough for 95% of Python code out there in practice.

For examples on the Python module finder, see the macropy3 bootstrapper and the dialect importer.

Workaround

At the moment, it is already possible to specify manually which files to include in the analysis. Examples:

pyan3 -nuca --dot main.py helpfulfunctions.py foo*.py
shopt -s globstar
pyan3 -nuca --dot **/*.py
find . -not -iwholename '*/test*/*' -name '*.py' | xargs pyan3 -nuca --dot

The last one includes *.py in the current directory recursively, but excluding any that have a test* directory anywhere in their path. This can be useful to eliminate noise introduced to the analysis by unit test modules in a large codebase.

Technologicat commented 5 years ago

Hmm, maybe no need to build this from scratch. The jedi library could be useful.

Judging by anaconda-mode-find-definitions (see anaconda-mode for Emacs), it's a real jedi at finding the relevant definition for a symbol in a Python source file. This could probably be leveraged to get a list of modules.

Technologicat commented 4 years ago

Also, there is now an import analyzer in modvis.py.

kasimtasdemir commented 1 year ago

For only one level of recursion,

*.py */*.py

worked for me:

pyan3 *.py */*.py --uses --no-defines --colored --grouped --annotated --svg >myuse.svg

I use anaconda prompt on Windows 10