Python version: interp_stmts has odd structure

IUCompilerCourse / Essentials-of-Compilation

A book about compiling Racket and Python to x86-64 assembly

1.29k stars 141 forks source link

Python version: interp_stmts has odd structure #136

Open AndrewTolmach opened 2 years ago

AndrewTolmach commented 2 years ago

In Figure 2.3/2.4 (and subsequent chapters), interp_stmts does a deep match on individual statement kinds, rather than simply iterating over the list and invoking a separate interp_stmt function. This violates the code structuring principle noted on p. 7 footnote1 . Moreover, it is quite unclear why this function returns the result of the recursive invocation on the tail (ultimately it always returns None). The motivation for both these things is presumably to handle return starting in Ch. 8, but meanwhile they are quite mysterious, and set a poor example for students who will be looking at this code for guidance on how to write recursive functions and using match statements.

And on that point, even if this is kept as a deep match, why not write it using a match statement rather than testing (inefficiently in the super calls) separately for len(ss) == 0 ??

jsiek commented 2 years ago

The motivation is to handle Goto in the chapter for Booleans and Conditionals. (See interp_Cif.py.) I'm open to alternatives that are structurally recursive. Perhaps the way to go is a continuation passing approach, in which the following statements are passed as a parameter to interp_stmt?

jsiek commented 2 years ago

I've pushed the change to the continuation passing approach.

AndrewTolmach commented 2 years ago

But why is there any need to pass continuations?

If you get a chance, take a look at branch apt-stmt-experiment of python-student-support-code. I haven't made a formal pull request for it. This implements two (essentially independent) things:

Statement-level interpretation without continuations. I'm really not sure why you wanted them; maybe I'm missing something.
Redoing the early interpreters (up to L/Cfun) to avoid the mysterious threading of return values from statements. Instead, this is added only in L/Cfun, relying on the fact that earlier stages implicitly return None. There is a small amount of code duplication, but in return the idea of "early return" with a value is made more explicit.

Even if you don't like the second change, I think the first one is a clear win unless I'm missing something.

Similar changes could be made in the type checkers, but that gets just a little hairier, and I'm not quite so worried about exposing students to (the existing form of) that code.

jsiek commented 2 years ago

The interpreter in your branch always executes all of the statements in a program. However, when there is a goto, that can be the wrong behavior. For example, in the following program, the statements x = input_int() and print(x) should not be executed.

x = 0
goto end
x = input_int()
print(x)
end:
print(x)

By passing the continuation as a parameter, we give control about what happens next to each statement. So the goto statement can decide not to execute the statements in its continuation, and instead only execute the statements at the indicated label.

AndrewTolmach commented 2 years ago

Um, but isn't it the case that gotos only appear in tail position within a block? (Which, if true, is something that could usefully be enforced in the grammar and checker for C_if.) So isn't your example impossible?

jsiek commented 2 years ago

That's a good point. The grammar only has gotos in tail position.

jsiek commented 2 years ago

Hmm, it looks like I let the interpreters get out of sync with the grammars at some point. The interp_stmts function should be replaced by interp_tail.

jsiek commented 2 years ago

Doh, I was looking at the grammar in the Racket version... ok, so the Python grammar and Racket grammar for the C IR languages differ in this regard... but should they?... part of what's tricky here is that I did not introduce a C_Var language in Chapter 2, and instead L_Var plays that role. And I've set up C_If so that L_Var is a subset. But that doesn't need to be the case. Yes, it would be good for my own sanity to get the Python version of C_If in sync with the Racket one.

jsiek commented 2 years ago

I've updated the grammars and interpreters for the C languages (C_If, etc.) but for now I'm leaving the continuation parameter for the L languages as the way to handle early return. While there are other ways to handle the early return, the nice thing about the continuation approach in the interpreter is that it jives with the structure of explicate_control.