BSLang / BS

Implementation of the BS language as created by Mark Rendle at BuildStuff.lt 2014. Refer to this repo for information and canonical list of language features
208 stars 9 forks source link

Operator precedence? #15

Open hilbix opened 2 years ago

hilbix commented 2 years ago

I saw your talk on YT about BS. But what I was missing is how operator precedence is handled in BS!

Consider the famous ternary a ? b : c ? d : e. All sane languages evaluate that like a ? b : (c ? d : e). But not PHP. In PHP it evaluates to (a ? b : c) ? d : e.

This shows there is a great potential to fail on operator precedence! Please, let BS not fall in this traditional trap again!

Also consider 1 + 2 * 3 which, in Europe, is calculated as 1 + ( 2 * 3 ). But in US calculus (rumours, is this really the case?), this should be calculated as ( 1 + 2 ) * 3.

So if we cannot even agree to this outside programming languages, how shall we solve this riddle? Well, here is my proposal:

As there apparently is no global consensus on which operator precedence is right, it must be considered something, which we might geht wrong in the language design of BS, too. Hence the best way to do is to avoid this trap altogether and leave operator precedence out of the language design. Read: Leave Operator Precedence to the programmer.

Can we agree to this or am I completely wrong here?

But how do we define the operator precedence as a programmer? Well, we could add some special notation into BS like 1 + 2 *€ 3 which means, the * is the expensive operation (hence the €) and thus should come last first, which renders this into (1 + 2) * 3 1 + ( 2 * 3 ). But this still leave use puzzled of the outcome of a naked 1 + 2 * 3. And no, we definitively do not want the compiler from telling us something like that we have to use parentheses to resolve ambiguity. As BS is a modern language, there must be some very clear rules built in, how this ambiguity is solved.

So we must have a way to decide that. Again, look at the world here to see what is in the wild. Some part of this planet writes left to right and some other part of the world write right to left (some write from top to bottom, but hey, this adds complexity, like adding 3D or 4D into this. So let's concentrate on the lines only here). As we do not want to tell something like "standard operator precedence goes from right to left", which might confuse people writing programs in the opposite direction (disclaimer: I do not say which writing direction is the preferred one, this is about the ambiguity of the documentation, such that we do not discriminate on in which direction people write), we specify it based on the column number the operator is in. For this we think of lines of code as being an array in writing direction, whichever this is for the programmer.

As the first index of an array is -1, an operator on the 1st column would have precedence -1, an operator on 2nd column -2 and an operator in the last position (which probably is illegal) would be in position -Columns.

Now, the above idea was to leave precedence to the programmer. How would that work out? How can a later operator have some higher precedence than an operator before?

Well, Math to the rescue. In Mathematics there is the Modulo operator. So we define the precedence of an operator as the outcome of the mathematical operation of

Operator_Column_Number MOD count_of_spaces_seen_so_far

where count_of_spaces_seen_so_far are all SPC characters in the line preceding the position of the operator (where TAB is not considered being a space). Preceeding means all array values from -1 up th the current evaluation position in the line.

This does not differ for the EVALUATE operation, of course.

When to decide which operator takes precedence, we calculate the above formula for each operator is pending (based on it's position) for the current parsing position, and then evaluate the one operator first, which has a higher precedence (where -1 is higher than -2 of course).

Some important details here:

Tie-Breaker calculation:

As we do not want to fall into the same trap as rehashing, which might not solve a tie in the first iteration, we do some calculation which always gives us a winner:

To sum it up:

For me this sounds to be a clever way to leave the full control to the programmer in a fully deterministic way without introducing additional complexity into the language like precedence tagging or superfluous parentheses.

Disclaimer: This is just an idea to fill in the missing gap of the burden to define some Operator Precedences and makes it easy to introduce a lot more operators without hassle in future.

Also think about that fact, that, this way, we possibly can leave away some traditional redundancy inherited from other languages, namely those parentheses for calculations. You already eliminated those braces for command grouping and did it the Python way of using indentation! Also remember what parentheses did to LISP! Modern languages should overcome all those old design fashions!

Thank you very much, -Tino

Edit: Corrected. For reference I left the old (wrong) part striked out. This just shows how easy it is to get things wrong here! Please consider that.

Edit2: There was some bullshit on operator sorting. It is sorted to column indexes, so operators on column 10 or higher are sorted before column 2. This is not a problem as column 1 usually has no ambiguous operator, and most operators start at column 10 anyway, just look at the examples.

Argavyon commented 11 months ago

Wouldn't it be much easier to assign precedence based on operand value? ie: 1 * 2 + 3 should be equivalent to 1 * (2 + 3), but 1 + 2 * 3 should be grouped as 1 + (2 * 3). Rationale: Larger values are more important and therefore should be given priority of operation. Numeric values ought to be assigned preference based on their absolute value, strings based on their lexicographical ordering (because A goes before Z, duh), and non-comparable types should throw a Doesn'tMath error when attempting to assign operation precedence. If two operand values are equal (and therefore precedence cannot be categorically determined), a DOESN'TCATEGORICALLYMATH warning should be raised instead, with operator precedence determined at random.