arturo-lang / arturo

Simple, expressive & portable programming language for efficient scripting
http://arturo-lang.io
MIT License
695 stars 30 forks source link

Chained conditionals: how they work, how they *should* work and how we can optimize them #1247

Open drkameleon opened 1 year ago

drkameleon commented 1 year ago

This issue is not properly speaking a bug, but it's most definite goal is to deal with - perhaps by design - features which often lead to issues: namely chained conditionals, or more concretely if?/else structures & case/when?/.../else structures.

Also, we've thoroughly discussed the issue with @RickBarretto, but I think it'd be better to have all of this in the open, not only as a means of exchanging ideas publicly, but also so that we can gather all the necessary information/tests/etc in one place. 🚀

drkameleon commented 1 year ago

The case of if?-else

First and foremost, a critical detail: in most programming languages, this would be considered a statement, while in Arturo it's not one statement; we actually have two: the first one being the if? (with its two arguments: the condition and the block), followed by else (with its single argument: a block).

Technically, this is like writing:

if? x = 2 [print "true"] ; first command
else [print "false"].    ; second command

We might tend to see them as one structure, but we have to be very careful not to. They are 2 commands ❗

How was that working?

The if? command - the way its implemented in our library, at least - is supposed to act like a normal if, only instead of just checking the veracity of the given condition and executing the block (or not), it also returns the value of that condition (true/false). And in that sense, it abides by the unwritten rule in Arturo that all predicates (= functions that return a Logical, true/false, value) end with a question mark. So far, so good.

Now, how does this else work? Since it always follows an if? statement (or at least that's what's expected), it actually pops the last Logical value from the stack (the one that has been pushed by the preceding if?) and - based on that - decides whether it should procede and execute its own block, or not.

Its implementation is also quite easy to grasp.

How is it working now? (or at least most of the time)

As much as I kept taking all of the above for granted, to my own suprise, someone decided to go on and implement an AST tree, that stands between our parsed values and the generated bytecode. And not only that: he also decided to optimize this if?-else structure away! 🤣

⚠️ In a few simple words: whatever the if? and else implementation in our library may be (see above ^), it doesn't matter; they are not used!

So, when I tried to see the produced bytecode for a very simple if?-else block, the bytecode I saw had no calls to if? or else at all. It was actually, a very clean, Assembly-style block of logical jumps.

Example input:

ib: function [x]-> inspect to :bytecode x

ib [
    if? x=2 -> print x
    else -> print x*2
]

Example output:

[ :bytecode
        ================================
         DATA
        ================================
        0: x :word

        ================================
         CODE
        ================================
        consti2
        load0
        jmpifne              @4
        load0
        print
        goto                 @4
        consti2
        load0
        mul
        print
        end
]

So, what does this bytecode do?

All of this mess is just a far, far more performant and closer-to-the-metal way of implementing an if/else condition.

(@RickBarretto Understand the logic of the above tiny block of bytecode - fully, totally, 100% - and I can assure you you'll be on your way to... crash all of your CS professors! 😉 )

So, why write all this?

Because apparently, some of these leave hanging values onto the stack. And from what I've written above (and the AST optimization), it most definitely appears the if?-else is not the culprit. (Or, at least, it shouldn't be, if the AST optimizations work at all times, without a single exception - which has to be proven in practice!)

drkameleon commented 1 year ago

As a useful and important sidenote to the above explanation:

Arturo already has a one-statement, if?-else alternative; and that would be switch (or ?), which is very similar - if not identical - to Rebol's either.

Now, why have this too? Do we just need another function simply to do the exact same thing?

It's not that simple.

What it comes down to again is the fact that we would be talking about one single statement (instead of two). And that means that, using switch, we can do things like:

x: 0

y: switch x=0 -> 1 -> 2 ; if x is 0, set y to 1, otherwise set it to 2

or using our ? alias:

x: 0
y: (x=0)? -> 1 -> 2

...which looks good and tidy; plus, if we were to use if?-else for that, I don't think I could come up with a different, safe solution, other than:

y: 0
x: 0
if? x = 0 -> y: 1
else -> y: 2

(Verbose... too verbose! lol)

drkameleon commented 1 year ago

Note no 2:

b: function [x]-> inspect to :bytecode x

identical?: function [z][
    one? unique map @z 'a -> to :bytecode a
]

ib [
    if? x=2 -> print x
    else -> print x*2
]

print identical? [
    [
        if? x=2 -> print x
        else -> print x*2
    ]
    [
        if? x=2 -> print x

        else -> print x*2
    ]
    [
        if? x=2 
            -> print x
        else 
            -> print x*2
    ]
    [
        if? 
        x=2 
        -> print x
        else 
        -> print x*2
    ]
    [
        if? x=2 -> print x

        ; some comment
        ; another comment

        else -> print x*2
    ]
    [
        switch x=2 -> print x
                   -> print x*2
    ]
    [
        (x=2)? -> print x
               -> print x*2
    ]
]

Result:

[ :bytecode
        ================================
         DATA
        ================================
        0: x :word

        ================================
         CODE
        ================================
        consti2
        load0
        jmpifne              @4
        load0
        print
        goto                 @4
        consti2
        load0
        mul
        print
        end
]
true

(Not only do newlines not affect the produced bytecode, but... apparently switch is optimized in the exact same fashion!)

programandala-net commented 10 months ago

Now, how does this else work? Since it always follows an if? statement (or at least that's what's expected), it actually pops the last Logical value from the stack (the one that has been pushed by the preceding if?) and - based on that - decides whether it should procede and execute its own block, or not.

At first I found a bit confusing the fact the documentation doesn't mention the relation between if? and else explicitly, but only implicitly in the code examples. Also, the page about else does not include if? in the list of its related keywords... but if!

But at the end everything became clear. else is very versatile: it can be used by itself, provided you know what's on the stack, like in this Forth-like code I wrote to simulate a BASIC numerical input:

until [
    numberString: input "Enter a number: "
    dup numeric? numberString else [ print "Number expected. Retry." ]
] [ ]
number: floor to :floating numberString
drkameleon commented 10 months ago

At first I found a bit confusing the fact the documentation doesn't mention the relation between if? and else explicitly, but only implicitly in the code examples. Also, the page about else does not include if? in the list of its related keywords... but if!

The truth is the documentation could be lacking in various cases. And that is one of them.

Btw, you're more than welcome to make a PR! 😉

stale[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.