MeteoSwiss-APN / dawn

Compiler toolchain to enable generation of high-level DSLs for geophysical fluid dynamics models
MIT License
28 stars 30 forks source link

Remove compound assignments and increment/decrement ops from IIR #721

Open Stagno opened 4 years ago

Stagno commented 4 years ago

They are syntactic sugar, only useful at DSL level. Currently they are treated as special cases in IIR. GTClang/gt4py can do the translation.

Example:

i *= q;    => i = i * q;
++i;       => i = i + 1;

This would allow PassRemoveScalars to run also for Cartesian (see https://github.com/MeteoSwiss-APN/dawn/pull/698/files ).

twicki commented 4 years ago

Well, these operations are common among languages. Restricting the SIR seems to restring SIR-users (aka front end developers). If it causes problems internally I would strongly advise only removing it in IIR and forward

Stagno commented 4 years ago

I think the purpose of an intermediate representation is to be convenient for semantic analysis, having two ways to represent the same operation is not good. That means that syntactic sugar (which is certainly handy for a developer who writes in the DSL) should go away in the DSL->HIR lowering (otherwise why keep an HIR at all?). I don't think that front end developers are restricted, it's just a matter of applying an extremely simple transformation to get to a common representation.

havogt commented 4 years ago

these operations are common among languages

This is not a strong argument in my opinion. We should think if there is semantic difference between a += b and a = a + b at the SIR level. If yes, we should keep it, if no, we should remove it.

In C++:

The behavior of an expression of the form E1 op = E2 is equivalent to E1 = E1 op E2 except that E1 is evaluated only once.

Is "evaluated only once" relevant for us?

In Python:

An augmented assignment expression like x += 1 can be rewritten as x = x + 1 to achieve a similar, but not exactly equal effect. In the augmented version, x is only evaluated once. Also, when possible, the actual operation is performed in-place, meaning that rather than creating a new object and assigning that to the target, the old object is modified instead.

Which adds the point of re-using a variable (which is implicit in C++).

Personally, I would go for removing them as a) I don't see that we will exploit the semantic difference of the two ways of expressing assignments; b) it simplifies our internals. If we need the semantic difference later, adding them back seems easier, as our current implementation doesn't take the difference into account and will most likely be wrong anyway.

BenWeber42 commented 4 years ago

Some aspects we need to consider (tried to order them by importance):

  1. Should SIR not support compound assignments and increment/decrement ops at all? Or should it support them, but remove them at some point as a pass?

For front-ends that want to expose a language that has these features, it could make a big difference. On the other hand, it might not necessarily be so easy to remove them in our compiler.

  1. Does it make a difference in hardware/assembly because a) the floating point operation isn't equivalent anymore? or b) because it affects performance (e.g., a low level compiler or the hardware could optimize one case further)?

Floating point operations are often not like math operations. E.g., addition in floating point isn't associative -> if I add all elements of a vector, I could get different results depending on the order of iteration. Could this also happen when converting += to = ... +?

So reordering floating point operations can technically change the semantics of the program. This can prevent possibilities for optimization. Flags like fast-math from gcc allow the compiler to still reorder operations that could change the result ( https://stackoverflow.com/questions/7420665/what-does-gccs-ffast-math-actually-do ). This raises the bigger question, what's our approach here? Do we assume fast-math by default or do we want to address this issue in some other way?

  1. Does it make a difference for storage implementations?

These statements:

field int a;
a = a + 1;
a += 1;

could be translated to C++ code like:

my_storage_implementation.fetch_field(a, i, j, k) =
  my_storage_implementatin.fetch_field(a, i, j, k) + 1;
my_storage_implementation.fetch_field(a, i, j, k) += 1;

Could this have a performance impact? In C++ it might be possible that the fetch_field method is implemented once returning a reference and once a const reference. Would the compiler be able to realize that it's enough to only get the reference once, read and then assign to it? Could we do this in our compiler if necessary?

what about C++ generated code like:

my_storage_implementation.fetch_field(a, i, j, k) =
  my_storage_implementatin.fetch_field(a, i, j, k) +
  __some_other_generated_function(...);

If the compiler can't proof that __some_other_generated_function doesn't affect the result of the fetch_field function, then it has to execute fetch_field twice?

  1. If the lhs can be a very expensive operation, then it would be very important that our compiler is able to convert back to +=. Can lhs expressions be expensive?

Probably unrealistic example:

function some_boolean(..) { ... return some_boolean_value; } // expensive function
(some_boolean(...) ? field_A : field_B) += 1;
  1. If lhs expressions can have side effect, then <expr> += 1 won't be equivalent to <expr> = <expr> + 1 in general. Do we want to force lhs expressions or expressions in general to be side-effect free?

This is probably a separate discussion, but could heavily affect this discussion (I don't know if there is an issue/document open for this?). In the workshop, we agreed that we shouldn't allow assignments as expressions. But if functions can have side-effect and be used as expressions, then expressions can still have side-effect.

Stagno commented 4 years ago

Decision taken during workshop:

For now remove when lowering, then reassess once we have more information (we find / don't find use cases to distinguish the semantics).