Ada-Rapporteur-Group / User-Community-Input

Ada User Community Input Working Group - Github Mirror Prototype
26 stars 1 forks source link

Order of evaluation in assignment statements #89

Closed danabr closed 2 months ago

danabr commented 4 months ago

Both the Ada 2022 and the Ada 2012 standard say:

For the execution of an assignment_statement, the variable_name and the expression are first evaluated in an arbitrary order.

I am wondering how far reaching that statement is. If both the right hand side and the left hand side of an assignment contains subprogram calls, can those calls be made in arbitrary order, or will all the left hand side be evaluated before/after all the calls of the right hand side?

E.g. if I have Arr(L1(X), L2(X)) := Arr(R1(Y), R2(Y)), can I be sure that the call order will be one of:

L1, L2, R1, R2
L1, L2, R2, R1
L2, L1, R1, R2
L2, L1, R2, R1
R1, R2, L1, L2
R1, R2, L2, L1
R2, R1, L1, L2
R2, R1, L2, L1

Or could the calls on the left hand side and the right hand side be mixed, such as: L1, R2, L2, R2?

As a further example, what about the situation where the left hand side contains a pointer dereference, and the pointer is manipulated as as a side effect of a subprogram call on the right hand side? e.g. Ptr.all(Foo(I)) := Bar(I), where both "Foo" and "Bar" manipulate Ptr. Is it valid to have an execution order of 1. Foo, 2. Bar, 3. dereference (i.e., in a sense you get a mixed order of evaluation between the LHS and the RHS)? Is that what the word "first" in the the sentence quoted above is about?

This is naturally nothing that a sane programmer would write, but I am interesting in analyzing arbitrary Ada code and hence I am interested in understanding the intended semantics also of these kind of obscure constructions.

CKWG commented 4 months ago

AARM 1.1.4(18, 18.a-d) neither has a definite answer to your question.

Richard-Wai commented 4 months ago

The paragraph you reference - 5.1-7, is exactly as it sounds, and reaches as far as assignment statement evaluation. As you have guessed, it does in fact mean that you cannot rely on any particular order of calls to L1, L2, R1, or R2 for the entirety of that assignment statement. This is an intentional design decision. Allow me to explain.

Recall that one of the design principles of Ada is efficiency. Ada is/was often used in real-time and embedded contexts where efficiency of execution is critical. To this day, Ada remains comparably performant to other compiled languages like C, C++, and Rust. Modern compilers can be extremely sophisticated in the assignment of registers, and using secondary side-effects of various instructions to achieve immense efficiency gains across evaluation generally, and subprogram calls specifically. By allowing arbitrary order of evaluation, we can give the compiler a wide birth to do its magic.

While Ada definitely stands-out for its other design focus on program correctness, there is often balance to be struck between those two goals. Generally we try to be careful not to spend too much time trying to prevent so-called "pathological" programmer errors, with a preference instead of common (unintentional) programmer errors. Pathological errors are generally errors that a well-meaning and reasonably competent programmer is highly unlikely to make by mistake. Put another way, most pathological errors are created by people deliberately trying to find a hole in rule, or trying to get around a rule. Sometimes pathological-style errors can appear in actual code written by extremely inexperienced or incompetent programmers, but those programmer's work will stand out in myriad other ways as well. We can only do so much. Professional work needs to be done by professionals at the end of the day.

I think the examples you have given here are good examples of pathological errors. Here we have these curious subprograms that all have nasty global side-effects - that already is a giant red-flag - and the output of these subprograms are being contorted to produce this example. In effect, the given examples are not likely to occur "in real life", and if you have a developer that does something like that anyways, all bets are off.

Specifically I think your example WRT Foo vs Bar is highly pathological. To imagine that a well-meaning, even minimally competent programmer would do something like that in real life, seems to me to be highly improbable. Is it worth the massive performance hit just to make sure we prevent that particular error? There in-lies a much deeper discussion, and is often the kind of discussion the ARG focuses on. It is not a simple matter at the end of the day. Personally I think we made the right decision in this case, but I'm sure valid arguments against it remain.

danabr commented 3 months ago

Thank you very much for your replies. As mentioned, I am writing an analyzer for Ada code, and I have to choose and describe a semantics for implementation specific code like this. I wanted to choose one that does not violate the Standard, and from your answers it seems I am free to pick anyone I want (although picking the same one as the one used by the compiler that will compile the code in the end would be wise of course).

I was particularly interested in knowing whether the "variable_name" and "expression" should be interpreted as atomic entities, so that one needed to be evaluated fully before the other. From your replies, I gather that is not the case.

Thank you for the clarification.

Richard-Wai commented 3 months ago

Thank you very much for your replies. As mentioned, I am writing an analyzer for Ada code, and I have to choose and describe a semantics for implementation specific code like this. I wanted to choose one that does not violate the Standard, and from your answers it seems I am free to pick anyone I want (although picking the same one as the one used by the compiler that will compile the code in the end would be wise of course).

Note that the order that the compiler selects is typically very dependent on the compiler, it's version, and especially the target architecture, as well as configuration such as instruction subsets and optimization level.

I was particularly interested in knowing whether the "variable_name" and "expression" should be interpreted as atomic entities, so that one needed to be evaluated fully before the other. From your replies, I gather that is not the case.

I think there might have been some confusion. Expressions themselves are not necessarily (or even typically) evaluated in an arbitrary order, but parts of them might be, depending on the class of "relation" within an expression. Your examples have a variable_name and an expression. What is arbitrary according to 5.2-7, is whether the variable_name is evaluated before the expression or vice-versa. The evaluation of either of those things (name or expression) has its own set of rules that are governed in completely separate parts of the standard. Expressions are the most complicated in that sense, and include a lot of rules that DO force ordering of evaluation in some cases. Refer to subclauses 4.4 - Expressions, and 4.5 - Operators and Expression Evaluation

Suffice it to say, however, the crux of your examples has to do with the evaluation of names (subclause 4.1), which is ultimately what constitutes both the variable_name and the expression in both your examples above, since both are names. Lets focus on your first example.

In both cases you have the evaluation of an "indexed component" of some two-dimensional array. See subclause 4.1.1 - Indexed Components: An indexed component consists of a prefix (also a name), and then a pair of parenthesis that enclose one or more expressions that are the indexes, one for each dimension. For the dynamic semantics of 4.1.1 we see in paragraph 7:

For the evaluation of an indexed_component, the prefix and the expressions are evaluated in an arbitrary order.

It is that part, 4.1.1-7, that actually determines why the order of calls to L1 vis-a-vis L2 and R1 vis-a-vis R2 are arbitrary.

I note you correctly understood that (for example) "L1, R2, L2, R1" was NOT a possible order of calling. Ergo you are correct that either the variable_name or the expression will be completed evaluated before the other is.

Thank you for the clarification.

Please note, as per the README, that issues should only be closed by ARG management, which is typically the ARG editor.

danabr commented 3 months ago

Please note, as per the README, that issues should only be closed by ARG management, which is typically the ARG editor.

I apologize, I had not read the README properly.

ARG-Editor commented 2 months ago

I believe that we fairly recently had some discussion about whether it could be depended upon that the two sides of an assignment cannot be overlapped, and I think we concluded that each side had to be evaluated in one piece (not interleaved). But I forget what that discussion was in relation to, so I don't know how to easily find it.

In any case, I take it that the questioner finds the question has been adequately answered. So I've tagged this as a confirmation and closed the topic.