Operator precedence? - Githubissues

I saw your talk on YT about BS. But what I was missing is how operator precedence is handled in BS!

Consider the famous ternary a ? b : c ? d : e. All sane languages evaluate that like a ? b : (c ? d : e). But not PHP. In PHP it evaluates to (a ? b : c) ? d : e.

This shows there is a great potential to fail on operator precedence! Please, let BS not fall in this traditional trap again!

Also consider 1 + 2 * 3 which, in Europe, is calculated as 1 + ( 2 * 3 ). But in US calculus (rumours, is this really the case?), this should be calculated as ( 1 + 2 ) * 3.

So if we cannot even agree to this outside programming languages, how shall we solve this riddle? Well, here is my proposal:

As there apparently is no global consensus on which operator precedence is right, it must be considered something, which we might geht wrong in the language design of BS, too. Hence the best way to do is to avoid this trap altogether and leave operator precedence out of the language design. Read: Leave Operator Precedence to the programmer.

Can we agree to this or am I completely wrong here?

But how do we define the operator precedence as a programmer? Well, we could add some special notation into BS like 1 + 2 *€ 3 which means, the * is the expensive operation (hence the €) and thus should come ~~last~~ first, which renders this into ~~(1 + 2) * 3~~ 1 + ( 2 * 3 ). But this still leave use puzzled of the outcome of a naked 1 + 2 * 3. And no, we definitively do not want the compiler from telling us something like that we have to use parentheses to resolve ambiguity. As BS is a modern language, there must be some very clear rules built in, how this ambiguity is solved.

So we must have a way to decide that. Again, look at the world here to see what is in the wild. Some part of this planet writes left to right and some other part of the world write right to left (some write from top to bottom, but hey, this adds complexity, like adding 3D or 4D into this. So let's concentrate on the lines only here). As we do not want to tell something like "standard operator precedence goes from right to left", which might confuse people writing programs in the opposite direction (disclaimer: I do not say which writing direction is the preferred one, this is about the ambiguity of the documentation, such that we do not discriminate on in which direction people write), we specify it based on the column number the operator is in. For this we think of lines of code as being an array in writing direction, whichever this is for the programmer.

As the first index of an array is -1, an operator on the 1st column would have precedence -1, an operator on 2nd column -2 and an operator in the last position (which probably is illegal) would be in position -Columns.

Now, the above idea was to leave precedence to the programmer. How would that work out? How can a later operator have some higher precedence than an operator before?

Well, Math to the rescue. In Mathematics there is the Modulo operator. So we define the precedence of an operator as the outcome of the mathematical operation of

Operator_Column_Number MOD count_of_spaces_seen_so_far

where count_of_spaces_seen_so_far are all SPC characters in the line preceding the position of the operator (where TAB is not considered being a space). Preceeding means all array values from -1 up th the current evaluation position in the line.

This does not differ for the EVALUATE operation, of course.

When to decide which operator takes precedence, we calculate the above formula for each operator is pending (based on it's position) for the current parsing position, and then evaluate the one operator first, which has a higher precedence (where -1 is higher than -2 of course).

Some important details here:

I am currently undecided how to calculate a MOD of a negative number. We can calculate it as -(-A MOD B) or we could promote the negative integer to some unsigned integer first and take the MOD from this. Perhaps we should do some research which is best first. I think that the unsigned variant will show most practicably, but YMMV. Also we should define a maximum line length, such that it fits into the chosen integer range (which is 17 bits, I suppose).
In case there is a tie, so two or more operators have the same precedence at that position, BS should do some tie-breaker calculation. See below.
Also I am a bit undecided how to calculate the column. We could calculate this on a token base, where one token can be made of multiple characters, or we can calculate this on the character base, where this is based on the current character encoding set (remember: € character has 2 bytes in UTF-8). This has to be defined here, like "first convert line into UTF-256 and then calculate the start of the operator as the byte offset in the line" or something like that.

Tie-Breaker calculation:

As we do not want to fall into the same trap as rehashing, which might not solve a tie in the first iteration, we do some calculation which always gives us a winner:

We count the operators of same (highest) precedence. This number gives us the value COUNT_OPERATORS. This value is two or higher in that case (as there is some tie).
Then we order the operators according to their position in the line. We do this using ASCII sort, so indexes go -1, -10, -11, -2, -3 .. -9. ~~This is not a big problem, as, usually, we will not have 10 or more operators. But always be prepared for special cases.~~ (See Edit2 below)
Then we calculate NumberOfLinesProcessedSoFar MOD COUNT_OPERATORS.
Call this number reS, it is from 0 up to COUNT_OPERATORS-1.
But as we deal with arrays here, we need negative indexes. Hence we use -reS as the index into the operator's array.
This leaves one gap, as -0 is not a valid index into arrays. This then addresses the index -COUNT_OPERATORS.
Rationale here is that this favors the 2nd operator over the 1st one. This is usually what we expect, right?

To sum it up:

For me this sounds to be a clever way to leave the full control to the programmer in a fully deterministic way without introducing additional complexity into the language like precedence tagging or superfluous parentheses.

Disclaimer: This is just an idea to fill in the missing gap of the burden to define some Operator Precedences and makes it easy to introduce a lot more operators without hassle in future.

Also think about that fact, that, this way, we possibly can leave away some traditional redundancy inherited from other languages, namely those parentheses for calculations. You already eliminated those braces for command grouping and did it the Python way of using indentation! Also remember what parentheses did to LISP! Modern languages should overcome all those old design fashions!

Thank you very much, -Tino

Edit: Corrected. For reference I left the old (wrong) part ~~striked out~~. This just shows how easy it is to get things wrong here! Please consider that.

Edit2: There was some bullshit on operator sorting. It is sorted to column indexes, so operators on column 10 or higher are sorted before column 2. This is not a problem as ~~column 1 usually has no ambiguous operator, and~~ most operators start at column 10 anyway, just look at the examples.

BSLang / BS

Operator precedence? #15