Technologicat / pyan

Static call graph generator. The official Python 3 version. Development repo.
GNU General Public License v2.0
329 stars 57 forks source link

enh: ordered sequence of functions #53

Open laimaretto opened 3 years ago

laimaretto commented 3 years ago

First of all, wanted to thank you for this excelent project. I've been looking for something like this for a long time. It's impressive, thanks!.

Now, I've used it with a couple of codes I have, and I really like the results. However, I was thinking if you may have this in the TODO list:

1) ability to show an ordered sequence of functions. For example, the render produced by the tool nicely shows how the functions are related among them even showing the direction of the connection, but its not clear from the graph which function is called first, which second, etc. Example, this is an actual output of a code of mine: image In here you can tell the relationship between the pink function and the green ones, but you cannot tell which of the green ones is called first, second, etc. Indeed, in this example that I'm showing, the best and correct representation would be like this: pink_out -> green1 -> green2 -> ... -> greenN -> pink_in.

2) There are situations where one function or another one will be triggered, depending on the output of an if-else-statement. Is it possible to obtain something like this?

Kinds regards!

Lucas

Technologicat commented 3 years ago

Thank you for the kind words.

Pyan is a static analyzer, so by design, it has no concept of temporal ordering. I think the pythonic thing to do is to keep it that way, because static analysis is considerably simpler than dynamic analysis.

If we want to do dynamic analysis, it's not just linear execution we have to consider, peppered with function calls, if-elses, and loops, but there may also be exceptions (escape continuations). Also, if one tries hard enough, it's possible to hack in a condition/restart system à la Common Lisp (resumable exceptions!), functional loops, or even genuine multi-shot continuations (often known as call/cc).

Things such as these complicate the possible paths a piece of code can take, so a fully reliable dynamic analysis for everything Python can express is IMHO not really possible. A static analysis is already difficult - it's easy to cause Pyan to miss a dependency (see below for an example).

Also, some important static analysis features are still missing - for example, Pyan does not know about syntactic macros. While Python itself doesn't provide a macro facility, that hasn't stopped third-party addons from providing one. macropy is well-established, having been around since ~2013. There's also the more recent (very pythonic and compact) mcpy, and right now, I'm building mcpyrate, a next-gen macro expander and language lab for Python. Such tools complicate both kinds of analyses considerably - but can sometimes make code much shorter.

So, considering everything that could possibly happen in a Python program, I'm starting to believe in the old adage that the only reliable way to know what a program does is to run it (maybe with a debugger). Hence not much has been happening in Pyan development recently. :)

That said, the graph in the screenshot represents this structure:

def pink():
    green1()
    green2()
    ...
    greenN()

The graph pink → green1 → green2 → ... → greenN, on the other hand, already represents this:

def pink():
    green1()

def green1():
    green2()

def green2():
    ...

def greenNminus1():
    greenN()

def greenN():
    ...

By the way, in a language that has tail call optimization - which Python doesn't, but see unpythonic if you need to have it - this would be a valid way to structure the code, if thegreen functions are called only as a chain (i.e. not needed independently of each other). Though then I might nest the definitions to make that explicit:

def pink():
    def green1():
        def green2():
            ...
            def greenNminus1():
                def greenN():
                    ...
                greenN()
            greenNminus1()
        green2()
    green1()

And finally, to fix the fact that the sequence of calls reads backwards, define a suitable decorator to automate those:

from unpythonic import call

def pink():
    @call
    def green1():
        @call
        def green2():
            ...
            @call
            def greenNminus1():
                @call
                def greenN():
                    ...

But then Pyan will miss the dependencies, since it doesn't know what call does. :)

So, in summary, Pyan does a static analysis only. At least right now I'm not considering adding dynamic analysis to Pyan - but that's not to stop you or anyone else interested from adding it.

If you want to experiment with the code yourself, look at pyan/analyzer.py - that's the AST walker that analyzes the code and builds the call graph; the other modules are basically infrastructure. You'll probably want to keep track of the current line number in the source code being analyzed. It's stored by Python in the lineno attribute of the AST node being analyzed. (The line numbers are filled in by ast.parse.) Anything else you might want to know about Python AST nodes can be found in Green Tree Snakes ("the missing Python AST docs"). See also the helpers defined by the stdlib ast module, particularly iter_fields and iter_child_nodes.