Technologicat / pyan

Static call graph generator. The official Python 3 version. Development repo.
GNU General Public License v2.0
329 stars 57 forks source link

Asserts can be removed during bytecode optimization #29

Closed edumco closed 4 years ago

edumco commented 4 years ago

From Codacy issues

assert is removed with compiling to optimised byte code (python -o producing *.pyo files). This caused various protections to be removed. The use of assert is also considered as general bad practice in OpenStack codebases.

Link to list all occurrences: https://app.codacy.com/manual/edumco/pyan/issues/index?bid=15369107

edumco commented 4 years ago

Github suggested this issue #18 as a similar bug. And it looks promissing because it only happens on runtime.

Technologicat commented 4 years ago

Thanks for the report!

As for #18, there's a bug somewhere in the algorithm that populates the graph. A helpful, minimal working example has been posted, so it shouldn't be too difficult to figure out what is happening there, once I have a couple of hours to dedicate to that. I suspect it's unrelated to the asserts - or if anything, we're missing an assert somewhere, to fail-fast the particular assumption violation that currently bubbles up until it causes a visible failure much later (likely far away from its original cause).

I've just been ~lazy~ away from Pyan for a while, so I haven't looked at it in detail yet, beside what I already wrote in the comments to #18.

As for asserts in general, I just happen to come from a different school. This is a multifaceted issue requiring an essay answer, so please bear with me. :)

IMHO, the assert cond construct itself is neither bad nor good - that depends on how it's used. (Unlike, say, unrestricted goto, which is inherently harmful, because it fundamentally makes it impossible to locally reason about control flow. As Nathaniel Smith has pointed out, goto has a parallel (pun not intended) in many popular concurrent programming schemes; his trio library is (AFAIK) one of the first attempts to build something that makes concurrent code easier to reason about - by eliminating unrestricted go, much like modern languages have eliminated unrestricted goto. What I mean by unrestricted is that there are still some restricted forms of goto, even in Python - break, continue, return and raise all transfer control, but only outward on the call stack.)

The assert cond construct is, fundamentally, merely convenient shorthand for the more verbose if not cond: raise AssertionError (in essence it's a very specific syntactic macro!). With readability equal (no golfing!), shorter code has fewer places for bugs to hide, and is faster to read (by humans), so the shorthand should be preferred over the longhand form. The shorthand also comes with the bonus of explicitly telling the compiler it is an assert, so the compiler knows to omit it from the bytecode output should the user request to do so.

(Code density concerns aside, it bears pointing out that shorter is better still only up to a point. As Paul Graham argues in On Lisp, there is a learning cost associated with each new construct. A shorthand, if a new construct is defined locally in a project, it should save at least several times the length of the longhand form (when amortized over the whole codebase), before defining the shorthand becomes an overall win. Here the learning cost is negligible, because assert is part of the base language, so anyone programming in Python is likely already aware of it.)

Codacy (new to me - thanks for the link, by the way!) correctly points out that asserts may end up removed from a production compile, implying that the code must be safe to run even if the asserts aren't there. Up to this point, I agree.

But I think asserts are still useful. I see their role as helping to catch violations of preconditions and postconditions at development time, as well as explicitly documenting such assumptions in a form that can be (optionally) machine-checked. Correct usage is to only assert on conditions that - in the absence of bugs - are always true. For things to which Murphy's law may apply at runtime (to paraphrase Peter Seibel), exceptions or another error handling mechanism (including conditions and restarts in Common Lisp, and the Either monad in Haskell) are the appropriate tools.

I think using asserts to enforce preconditions and postconditions is similar to using mypy for enforcing type safety. During development, certain safety checks are enabled, and in production, they are disabled, allowing the program to run a bit faster. This general pattern is very common - for example, Cython has compiler directives to disable bounds checking and wraparound (a.k.a. negative indexing) for array accesses.

Ideally, for checking preconditions and postconditions, I'd like a Turing-complete contract system, but... in a way, a general-purpose programming language itself already is one (the value of a specialized contract system being in the brevity and uniformity of notation). For most everyday uses, asserts at entry/exit points perform this job just fine.

Granted, there are (mathematical/logical) properties that are better captured in a static representation one meta-level higher, because the halting problem and its cousins make it impossible, in general, to dynamically check for things such as "is this sequence finite?". But in any sufficiently flexible language, it's just a design choice of which properties to encode in the concept of static type, versus which ones to dynamically verify at runtime.

Python actually already has something resembling this distinction - the language is fully duck-typed, but then there are things like the metaclasses in collections.abc, their generalization typing.Protocol, mypy... and indeed, classes themselves (even, with The Right Way to handle multiple inheritance!), if one chooses to subscribe to the (very un-ducklike) notion of nominative (as opposed to structural) typing. I suppose both kinds of typing are just tools, with different strengths and weaknesses.

(It's not the only such "mismatch" between how Python presents itself at the surface, and how it actually works or what can be done with it. Python is actually flexible enough to support syntactic macros, although they are not commonly used. Python's object model is actually prototype-based, just like JavaScript's, even though it attempts to look like a traditional class-based system (possibly to avoiding confusing those coming from C++ or Java?). In ES6, JS went the same route, too.)

Anyway, to return to the main topic of asserts, I think I understand where Codacy is coming from. Python advertises itself as a language for programmers of all skill levels, valuing clarity foremost. The assert construct is particularly easy to misuse, unless one is careful. It's a deceptively simple-looking advanced feature (I'd argue comments are, too...).

So to make a codebase easily approachable to the largest possible audience, it's safer if the code doesn't use asserts, so it's easier to resist the temptation to "follow the example" and add in some inadvertently incorrect ones when working on it.

If that is the reasoning behind the recommendation, my answer is more education for the general programming audience. I admit I don't know how well that scales. :)

Further thoughts welcome - including, and especially, opposing viewpoints.

edumco commented 4 years ago

I am incredibly happy with your answer, because it not only make me understand this issue more profoundly as it opens a whole new perspective on the language itself.

I never really understood (this deep) the use of assertions outside the tests.

Any tool we could use to enhance our code to me is a good thing and in this context appears to me that it is used exactly for that.

I'm a great fan of 'Codacy' and other code analyzers because they put questions on your head about the code you're producing, but it should never be used as bible. And this discussion (talk/conversation I'm no native english speaker) tackle so many things that i'm gonna need a few weeks to fully grasp all this richness.

You explain things very well! I think you should publish this conversation somewhere else.

Thank you a lot and a big hug from Brazil!

PS: I'm going to ignore this issues for this repo :)

Technologicat commented 4 years ago

Glad I could help. Yes, I might need a blog instead of using GitHub issue comments as such... :)

Here are some more ideas and/or pointers that may interest you, or indeed anyone who ends up here via Google:


EDIT: mention user-programmable infix operators for Racket, and CL's support for DECLARE. Mention imacropy and pydialect.