Lonami / pyndustric

A Python compiler for Mindustry's logic processors assembly language
MIT License
49 stars 9 forks source link

Supporting complex expressions like (1+2)+3 #2

Closed TelosTelos closed 3 years ago

TelosTelos commented 3 years ago

Currently Pyndustric raises an unsupported expression error for any complex expression that would require multiple lines of mlog code, like x = 1+2+3 or a = abs(x1 - x2). One of the key advantages of writing code in a higher-level language like Python is to avoid needing to break every single operation onto its own line of code, so it's quite unfortunate that Pyndustric forces it here.

The "right" way to do this is probably to have the instruction list start near the syntax-tree's leaves and assign the simple operations to dummy variables, which can then be used as arguments for later operations. (Sort of like you use a dummy variable as your counter when iterating over @links.)

So, e.g., x = 1+2+3 should be equivalent to:

dummy = 1+2
x = dummy + 3

And a = abs(x1-x2) should be equivalent to:

dummy = x1-x2
a = abs(dummy)

This should be relatively straight-forward to add to the code you have already. When your visitor visits the node for an expression, and discovers that one of the arguments is not itself atomic, rather than throwing an error, it should make up a name for a dummy variable, then first append a line to the list of instructions setting that dummy variable to this non-atomic expression (akin to the dummy = ... lines above), and then after that append a line to the list of instructions making use of this dummy variable to generate the final output.

Of course this should be done recursively, to break arbitrarily complex expressions into a series of lines, with each precursor operation occurring on a line preceding the line that uses that intermediate result.

And of course this should work to allow arbitrarily complex expressions anywhere a variable could be used, e.g., to let the python program say something like unit.approach(x+dx, y+dy), rather than first needing to explicitly assign x+dx and y+dy to variables on separate lines.

Lonami commented 3 years ago

The current implementation is a bit rushed as I was eager to have something working. Despite not currently supporting complex expressions, I think using Python is already a huge readability boost.

Now, in order to implement this, using dummy values as you said is the right way to go. To keep things simple, and given that Mindustry has "infinite" "registers", each expression needing a temporary should probably use their own (think (1+2)*(3+4) will become tmp*tmp if we don't do this).

For the time being, naming of these temporaries could be as simple as using a global identifier for the current temporary name (self._tmp_counter or something).

TelosTelos commented 3 years ago

To a first approximation, anywhere mlog wants part of a line to be an expression, it needs that expression to be atomic, so anywhere Python code puts something more complex than that within a line, we'll need to use one or more preceding instructions to store the value of that complex thing in some dummy variable, and then use that dummy variable where mlog wanted an atom.

The main wrinkle that complicates this is comparisons, which in some cases should behave as above, compiling to ops that store boolean outputs in variables, but in other cases should instead be compiled into the test part of a conditional jump command, rather than ever being stored in a variable. A further twist involving these is that it will often make sense to perform a negated version of a test comparison, e.g., in any case (like if or the beginning of while) where passing the given test leads to executing the next-stated command, and failing the given test triggers some jump to more distant code. Fortunately, mlog comparisons mostly come in negation-pairs, with the exception of AND, which we may need to consider as a special case, and OR, which mlog apparently doesn't implement at all, but we probably want to.

The "two-pass" way of handling this wrinkle would be to first compile all complex expressions, including comparisons, into ones that store their output in a (dummy) variable, and then to use a second optimization pass to detect cases where a dummy variable gets used only as a jump condition, and to consolidate the op that produced it into the jump condition when possible. This approach gains simplicity for the first pass, at cost of needing set up an optimization pass, including handling changes in line numbering. The main "one-pass" alternative is to build optimized instructions in a single pass, which would require sensibly handling positive-jump, negative-jump, and non-jump complex expressions in the various contexts they might occur, which ends up requiring a fairly complex sort of hand-shake between the various contexts that might require one of these three sorts of expressions, and whatever general routine is set up to parse such expressions...

I guess a relevant question is whether we expect to need to set up optimization passes for other reasons? If so, then the two-pass option above wouldn't add much complexity beyond what will be involved in setting up optimization passes anyway. So this would make me lean in favor of the two-pass option. In contrast, if we think we can do everything else without needing an optimization pass, then that'd make me reluctant to set up such machinery just for this, so I'd lean more towards the one-pass option.

(Of course another wrinkle is compiling vectorized operations, but I'll cross that bridge separately.)

Lonami commented 3 years ago

We definitely should go with the two-pass approach. The single-pass approach just doesn't scale, and this will give opportunity to the optimizer to improve code generated previously (emitted mlog instructions might be optimal locally, but the optimization pass will be able to spot more opportunities).

Lonami commented 3 years ago

Split off https://github.com/Lonami/pyndustric/pull/12/files#diff-852dce7acd90bf3bfbde1cec63522f7204b9c209cc52b3e40d23bb6f3202ef7fR72.

This issue also tracks support for the following:

The output of ast.dump for these under Python 3.9.1 is the following:

>>> print(ast.dump(ast.parse('x, y = 1, 2'), indent=4))
Module(
    body=[
        Assign(
            targets=[
                Tuple(
                    elts=[
                        Name(id='x', ctx=Store()),
                        Name(id='y', ctx=Store())],
                    ctx=Store())],
            value=Tuple(
                elts=[
                    Constant(value=1),
                    Constant(value=2)],
                ctx=Load()))],
    type_ignores=[])
>>> print(ast.dump(ast.parse('x = y = 1'), indent=4))
Module(
    body=[
        Assign(
            targets=[
                Name(id='x', ctx=Store()),
                Name(id='y', ctx=Store())],
            value=Constant(value=1))],
    type_ignores=[])
Lonami commented 3 years ago

Note with regards to the following check:

https://github.com/Lonami/pyndustric/blob/4acf5a89067b0516fe83b4f1de814a20716c1bc7/pyndustric/compiler.py#L68-L70

1 = x is actually invalid syntax, but the following is not assigning to a name:

>>> print(ast.dump(ast.parse('x.y = 1'), indent=4))
Module(
    body=[
        Assign(
            targets=[
                Attribute(
                    value=Name(id='x', ctx=Load()),
                    attr='y',
                    ctx=Store())],
            value=Constant(value=1))],
    type_ignores=[])
Lonami commented 3 years ago

Split off https://github.com/Lonami/pyndustric/pull/12/files#diff-852dce7acd90bf3bfbde1cec63522f7204b9c209cc52b3e40d23bb6f3202ef7fR83.

Before working on this, everything that deals with expressions should likely be factored out (comparissions, assignments, loading parameters all allow expressions).

Lonami commented 3 years ago

Split off https://github.com/Lonami/pyndustric/pull/12/files#diff-852dce7acd90bf3bfbde1cec63522f7204b9c209cc52b3e40d23bb6f3202ef7fR134.

The following function:

https://github.com/Lonami/pyndustric/blob/4acf5a89067b0516fe83b4f1de814a20716c1bc7/pyndustric/compiler.py#L453

Currently tries to be the place where "all expressions are handled", but it's not complete enough yet. This function (or a renamed variant) will probably become the generalized expression handler.

TelosTelos commented 3 years ago

Good, that agrees with what I had concluded myself, so good to have confirmation.

I think my next big goal shall be (1A) to generalize this to handle complex expressions (by first appending whatever preliminary instructions are needed to handle the branches, storing intermediate results in dummy variables, and then returning some atom-like string that can be used within a larger mlog expression, either a dummy variable name or the same thing as_value used to return), and also (1B) to set this up to handle vectorized operations, in which case what will be returned is a pair of values (one for the x component, one for y). I'm thinking about whether there's a good way to make an @vectorized decorator for visitor functions that will automatically vectorize operations whenever as_value returns a pair, so we won't have to actively rewrite everything that might end up handling vectors.

Since most vectorized operations have to do with units, and you haven't implemented unit control yet, I'll probably end up also (1C) setting up an object-oriented interface for unit control along with this. That'll likely bleed into (1D) making a similar object-oriented interface for block control, since blocks are the other vector-like entities.

TelosTelos commented 3 years ago

Happy to report that I now have this working nicely in my version (except it probably won't work yet with your implementation of user-defined functions, so will try to hook that up so function returns can participate in complex expressions too).

TelosTelos commented 3 years ago

And now your user-defined function handling is well-integrated with complex expression handing, so user-defined functions can have complex arguments, and their return values can participate in complex expressions. E.g., x = 1*f(2+3)+4 now works fine.

This also fixed the problem (issue #26) where distinct function return values referred to in the same expression would get muddled together. E.g. x = f(1)+f(2) now works fine (though there is still room for optimization, since we now effectively just have the "overkill" solution described in that thread, but we'd get more mileage out of optimizing how you pass args to the function and how you jump around at the end of a function than we'd get from optimizing this, so I guess it's not a high priority).

Lonami commented 3 years ago

Support for complex expressions added by https://github.com/Lonami/pyndustric/commit/70c6d6db15fdc5e307a3e3a649265ad350188c41, the above fixes assigning to tuples.