Lonami / pyndustric

A Python compiler for Mindustry's logic processors assembly language
MIT License
49 stars 9 forks source link

Removing the stack? #32

Open TelosTelos opened 3 years ago

TelosTelos commented 3 years ago

Probably the most controversial change I've been thinking about making is removing the stack, which (0) currently is used only as a conduit for passing arguments and return-lines to functions. The primary alternatives are (1) to pass function arguments and return-lines via variables, of which mindustry offers an unlimited supply, and/or (2) to treat functions as "inline" macros. I'm just posting pro's and con's here before pulling the trigger on this.

In favor of (0) using a stack: Putting return-lines on the stack allows recursive functions to return back up the chain to whatever initially called them. Currently there is no real advantage to putting arguments on the stack, but eventually this might allow for functions that pass an open-ended list of *args, or this might help with giving each recursion its own independent namespace. There would be an advantage to making function namespaces and recursion match the way they work in Python (as long as it runs acceptably fast, which I doubt it will). Quick math: pushing+popping an argument/linenumber takes three more instructions than simple variable setting (push+change_pointer+pop+change_pointer --vs-- set). To implement namespaces, we'd also need 4 instructions per "local" variable that will be allowed to have different values in different namespaces. There may be some ways of trimming a few of these (e.g. I think the current code manages to skips two back-to-back pointer changes in each call) but most of this will be unavoidable).

Against (0) using a stack: a stack requires an associated memory cell, and it requires extra instructions, both in total program length and while running. The current use of a stack kind of gives us the worst of both worlds: recursion and namespaces don't work in the standard Python way, yet it still needs a memory block and is needlessly slow. If we set it up so these really did work as they do in Python, it would be even slower, prohibitively slow for strategic purposes in game (though maybe there would be some hope that optimization passes could streamline some of this, but that'd take more work than I'm willing to put into it!). So long as we need coders to learn that Pyndustric functions don't work quite the same as Python functions, we may as well choose a system for them to learn that will be efficient and well-suited to the strategic purposes of people who are trying to write useful mlog programs.

Option (2) inline macros would be even more efficient than (1) passing via variables, as this trims jumps and argument-setting on both the calling end and returning ends. The only real drawbacks of (2) inline macros (as compared to (1)) are (a) that inline macros are a tiny bit trickier for us to code (since you have to replace arguments within the function body), and that for longer functions they'll add more to the total instruction count, which will risk running into the 1000-instruction cap. But my guess is that 90%+ of uses, these would just be flat out better, so if we were to support just one of these, (2) would probably be the way to go.

We needn't necessarily choose just one. We could allow a choice between these options with an @decorator syntax, or we could attempt to automatically detect which is best. Auto detection of global-vs-local variable use in a function is a pain but probably doable. True recursion requires a stack at least for return-lines. Non-stack functions are almost always better inline, modulo questions about whether you want to be able to read the argument-variables again outside the function, except in the case where you need to put long functions out-of-line to stay under the 1000-instruction cap.

So... which way should we go?

TelosTelos commented 3 years ago

I guess I'm inclined towards making the default be inline compilation (i.e. putting another copy of the function body everywhere the function gets called, with no need for jumps and little need for argument copying), since that'll be preferable the vast majority of the time, in a context where processor cycles are usually going to be our most limiting resource.

I think it may be worth offering some sort of @outofline decorate to allow creation Python-like out-of-line functions in the rare circumstances where those are needed, mostly (a) circumstances where the function is too long to put all compilations of it inline without exceeding the 1000-instruction cap, and/or (b) circumstances where the function calls itself recursively (either directly or via some intermediary). Circumstance (b) would generate an easily detectable infinite loop during compilation of an inline function, which could pop a compiler error recommending defining one or more of the offending functions @outofline instead. Unfortunately, by the time we detect this, it may be hard/arbitrary to choose which offending function(s) to move outofline, and it'd be be hard to undo previous inline compilations without just restarting the compiler, so probably better to pop an error and make the user responsible for the choice.

One tricky issue involves the question of whether or not inline defs should be allowed to change the value of variables given as arguments to the function -- i.e. of whether inline defs should be "macros" or "functions". Consider the following def.

def inc(x):
    x += 1
    return x

When construed as an ordinary function, inc(y) would leave y itself unchanged, and return the value y+1. Construed as a macro, inc(y) would increment the value of y to y+1, and also operate as this new value in a larger expression (i.e., x = inc(y) would be equivalent to the C expression x = ++y.) Macros produce more efficient mlog code (fewer assignments of variables to values) but are less familiar than functions to Python coders (though objects passed to functions do behave in almost this way in Python).

My inclination here is to say that whatever we do is going to be a bit unfamiliar, so we may as well go with the most efficient and most generally useful option, and explain it. So I'd lean towards macros. With functions, its practically impossible to get macro-like efficiency. In contrast, with macros, it's trivial to get function functionality: just use 0+y as the argument, rather than y itself, since this has the very same effect of making this instance of the def operate on some temp variable whose value is set to be a copy of y. E.g., inc(0+y), would compile as pseudocode temp = 0+y; inc(temp) which leaves y itself unchanged. So implementing defs as macros quite straightforwardly makes both options available to coders, both with pretty close to maximal efficiency, with the only drawback being needing to remember to use 0+ to protect an argument from being changed, when wanted.)

TelosTelos commented 3 years ago

So I've started writing code to allow flexible selection of different def compiling modes, via decorators, with the default mode set in constants.py. That raises the question of which modes we would want to offer, what decorators to use for them, and which to make default?

I think inline macro-like, probably called "@inline" and probably made to be the default, is probably the most important one to include and the only one that I'd be likely to use much, as it will often produce the fastest code. It can also easily be made to work like an inline-function (by preceding any variable arg that you want to shield from change with 0+) so there's probably no need to add inline-function as a separate mode.

That leaves various non-inline ("outofline"? "outline"? "jumpy"?) compilation options. We've considered three layers of bells-and-whistles that could be added here: (A) jumping to and from the single copy of the function's body, using variables to pass arguments and returnline -- this makes long frequently used functions take up fewer of your 1000 instructions, and allows a very limited stackless form of recursion with no need for an associated memory cell; (B) something more like what we currently have using a stack to store returnlines, and perhaps also to pass arguments though there's currently no advantage at all to using stack rather than variables for that. The only advantage to this over the preceding option is that this generates a trail of breadcumbs in the stack that recursive calls can use to trace all the way back to the original caller. (C) some imagined future version that somehow implements namespaces, and hence allows something much more closely akin to Python's recursion and namespaces, but at a significant cost in processor cycles.

My recommendation here would be (A), since it'll run faster, and I don't think the other bells and whistles are worth their costs. But until we agree to pull the trigger on stack removal, I guess I'll leave this at (B) for now, perhaps streamlining argument passing through variables, and perhaps relocating the function bodies to the end of the code now that that's easy to do, to save an early jump instruction.

TelosTelos commented 3 years ago

Hrm... it looks like you pushed the returnline to the stack last, and popped it right at the beginning of executing the function's body. If you want to be able to follow your trail of breadcrumbs later back to the original caller of a recursive function, you can't eat your breadcrumbs so soon! Instead, you should push the returnline first (alongside whatever namespace variables from the caller's context you decide to cache), and leave this on the stack until after whatever nested recursive calls have been made and you're ready to return up a level...

So the current implementation was just functionally equivalent to (A), and couldn't even do (B), much less (C)... That's making me more inclined to do away with the stack, since we really won't be losing any functionality at all (aside from making you dig through old code if you ever decide to do something stack-based, though probably at that point, it may be easier to just use the stack datastructure we talked about implementing instead).

Lonami commented 3 years ago

Instead, you should push the returnline first

It used to be like this and was changed to push return last because otherwise, when pushing arguments to make the call, there's a varying size in each push and it was difficult to calculate how many elements to skip when returning. But now that labels are a thing it should be trivial to fix this (and I guess earlier I could've used the "empty space, format later" approach but I didn't think of that back then).

TelosTelos commented 3 years ago

Python functions return None when reaching the end of the function without encountering an explicit return command. Is 0 the best mlog equivalent for us to return in that case?

TelosTelos commented 3 years ago

Oh, and this is a good point for you to either say "WAIT! I really don't want you to remove the stack!" or "Ok fine, implement it more efficiently for now, but I may come back and re-add a stack, recursion and namespaces later."

Lonami commented 3 years ago

We can get rid of the stack for now, and implement a subset of it if we get into recursion. But I think the no-inline version should stay (so if inline is added, both should co-exist).

TelosTelos commented 3 years ago

Good that fits with what I've now gotten mostly done. The easy-to-change default, for when there is no decorator, is currently @inline, and the "jumpy" not-inline alternative is currently triggered with the decorator @function. Is there a better name for this decorator?

I've streamlined the not-inline @function option so that the function's body now appears at the end (no need to jump past it at the beginning), and so that its returnline and args are now passed via variable rather than stack.

TelosTelos commented 3 years ago

I haven't implemented name-mangling on "jumpy" functions' arguments, but I easily could, since I've already set up mechanisms for variable-name substitution for the inline ones anyway. Would you like me to do it? With name-mangling, there will be no collision if a function uses global variables for the names of its arguments; without name-mangling, calling a function will overwrite whatever global variables share a name with its arguments (as I believe your code had done). Regardless, all non-argument variables mentioned in the function will be treated as global, at least for now. I can see advantages each way. I guess I'd lean slightly towards name-mangling, to make this a bit more pythonic at zero cost.

TelosTelos commented 3 years ago

Also, did you ever test that your defs work in-game? It looks to me like you set the returnline to be the current value of @counter in one instruction -- call this instruction 1 -- then jump to the function in a second instruction, and then when the function concludes jump back to 1 more than whatever you set the returnline to be, which looks to me like it would just get you back to this jump instruction again? Or does the @counter get advanced just prior to its being stored into the returnline? (I suspect you probably should have been adding 2, not 1.)

Lonami commented 3 years ago

Name-mangling is something we want, and supporting the global/nonlocal keywords as well (I described this somewhere, we're again drifting too much from the main issue's topic).

Yes, I did try this in-game, and it appears the @counter refers to the next instruction when you read it, IIRC. Testing this was how I found that state is not reset on import.

TelosTelos commented 3 years ago

Consider your names mangled! I don't remember Python's rules for global/nonlocal, so won't worry about that now, but it should be pretty straightforward to add using the same machinery.

(At least we've stayed on the topic of doing things that the stack had been doing in implementing functions, which is pretty on-topic for us! hee hee)

TelosTelos commented 3 years ago

Ok, it looks like both sorts of functions are working fine now. Looking at your test_def(), when using the stack to pass the single argument and return line, this occupied 26 instructions; when passing the argument and return line via variables this takes 22 instructions; when compiling the two calls inline, it takes 19 and should execute significantly faster, since these instructions all execute at most once, unlike the out-of-line instructions many of which are executed twice.

There's still some room for more optimization, e.g., to trim out an unreachable return 0 \ jump back that gets attached to the end of the function, even though both branches of the preceding if would jump away always. Or to trim jumps that arise as part of the if-then-else structure, that'll be unreachable due to return commands causing a jump away first. (I think I see a pretty easy way to detect and optimize the latter.)

TelosTelos commented 3 years ago

Yep, peeking at the preceding instruction to confirm that it could possibly continue on to this instruction before adding some boilerplate flow-of-control components was a pretty easy optimization to add. That trimmed another 1 instruction from the @function version of test_def, and another 2 off the @inline version. I don't see any simple ways of trimming more fat off these, so I'm going to call it good enough as is.

I must come clean though and admit that I've probably broken the ability to use your for loops in functions, since for loops still use the old non-label system for line-number substitution, whereas both flavors of function now practically require the newer label system. That should be a pretty easy fix at some point. I think that may be the only remnant of the old system left to switch over.

TelosTelos commented 3 years ago

Ok, fixed for loops so they'll work with functions, and also shaved a cycle-count off their execution, by replacing the final unconditional jump back to the initial test with a (negated) conditional jump back up to the body. We'd made a note to maybe support negative steps, but I didn't include that yet, since it'd cost multiple instructions for little benefit, plus I think our compiler still faints whenever it sees a negative number