eclipse-archived / ceylon

The Ceylon compiler, language module, and command line tools
http://ceylon-lang.org
Apache License 2.0
396 stars 62 forks source link

Properly define volatility in ceylon #3379

Open CeylonMigrationBot opened 12 years ago

CeylonMigrationBot commented 12 years ago

[@simonthum] This is a proposal to specify volatility of runtime state such that various current issues can be improved.

Goal

Ceylon has the goal to support immutable objects alongside mutable ones. I think the full benefit from that can only be had when defining a proper volatility concept, which is currently very sketchy in ceylon. Being puzzled by #3347, I concluded that in a multi-core world, state isn't just variable or not. So here's my take on that:

The idea is to know at compile-time how volatile any given expression is. For all evaluatable program elements, there must be reasonable volatility assumptions and/or checks in the compiler which allow the compiler and human reader to make solid assumptions about program behavior.

This creates headroom for meta-programming (as the spec already recognizes), safe optimizations such as CSE and generally eases reasoning about ceylon programs.

Current situation

I have compiled the current situation as I perceive it:

volatility | value defined in/at                | current ceylon
-----------+------------------------------------+---------------------------
(constant) | source code/compilation time       | literals (not specified)
immutable  | initialization of defining element | non-variable state 
local      | block scope                        | locals
volatile   | runtime                            | all variable state 

The problem with that is that the volatility I call "local" is only recognized for locals, not even for immutable attributes of locals. For method-level optimization, this volatility class is quite interesting though. (It might be that I didn't understood all of 4.6.1 and the class must be made even smaller for the argument to hold). The other issue is that without explicit specification of volatility, case-to-case decisions are made in the language where an overall framework would be better. I view #3347 as such an example. Also, the class which has to be assumed volatile is larger than it needs to be. Lastly, the compiler does not have anything but literals to evaluate at compile time, something e.g. c++ got rid of.

Proposal

Volatility could be defined like:

volatility | annotation     | determinism   | side effects | comment 
-----------+----------------+---------------+--------------+---------------------------------------
constant   | const          | compile-time  | not allowed  | safe use of attributes for metaprogramming
immutable  | immutable      | whole runtime | not allowed  | new default for attributes        
local      | (thread-)local | method-level  | not allowed  | thread-local attributes & locals
volatile   | variable       | none          | allowed      | non-local state & methods  

Every expression, then, could be considered to have the highest volatility of any of its sub-expressions. This can be made possible by limiting the freedom of attribute getters to evaluate nothing more volatile than declared (using annotations) for the respective attribute.

For example, an expression evaluating an immutable attribute of a thread-local variable would be considered to exhibit thread-local volatility, thus its value might be reused as long as no intermittent access can have happened.

Some random points:

In summary, this would clarify some TODOs in the spec and #3347. I'm not entirely sure there aren't more classes to discern, e.g. the special rule for local variables suggests the model isn't perfect yet. Also, the spec should be clarified which locals are thread-local (i.e. found on a stack or "be" thread-local).

Potential developments:

Anyway, I think this is something to get right before 1.0 because retro-fitting a new default volatility for attributes is likely a pain that can be avoided. I'm willing to flesh out the spec and assist implementation if it's considered worthwhile.

As always, comments are welcome!

[Migrated from ceylon/ceylon-spec#273]

CeylonMigrationBot commented 12 years ago

[@ikasiuk] This reminds me of discussions about immutable and const in the D programming language (definitively worth looking at in this context). A consistently defined volatility scheme has a certain beauty (and surely its advantages), but it also has a big impact on a language and on the programs you write in that language: once you start defining volatility somewhere, it tends to cascade through the whole program structure. And that makes me doubt whether it is possible to introduce this into the Ceylon language without adding more extra complexity than we are ready to accept.

One particularly interesting point is the influence on Ceylon's generalized concept of attributes. It seems to me that we would have to differentiate again between simple fields that cannot be defined as getters, and true attributes that correspond to getters/setters. But not having this distinctions is one of the really nice concepts of Ceylon IMO.

CeylonMigrationBot commented 12 years ago

[@simonthum] Thanks for getting through my brain-dump ;)

I agree it tends to permeate the programs you write, but since there is already variable or not (which gets you 90% of the possible benefit), it may come with quite low cost. It' mostly a question of what you want to achieve: Do you stop with the sole possibility of referential transparency and some concurrency sugar, or do you want to declare "immutable" variables which inhibit mutating operations etc? I'd go for the former.

I already went through the spec and can say that sure, it adds complexity, but it also clarifies things which otherwise come with their own complexity. Think if (exists ...) and how simple this could be with known-immutable state. So how much complexity it adds will likely be a function of how elegant it can be implemented. Right now, that is before looking into anything seriously, I think the cost is bearable.

I am too unsure about the attributes - the simple attributes seem overloaded and may need clarification. But as mentioned, volatility is only defined per the getter, so that probably will not be an issue. All non-immutable instance state will land in the volatile bin, and getters that evaluate it will need to be volatile too.

Though my main argument would be that "multi-core is here to stay" and languages should not ignore concurrency.

CeylonMigrationBot commented 12 years ago

[@ikasiuk] Maybe you are right. But I find it hard to imagine how a simple yet useful approach could look like. Can you illustrate what you have in mind with some example code?

CeylonMigrationBot commented 12 years ago

[@simonthum] Sure, an example is in order. Actually after posting I asked myself why I did not provide one.

interface A {
  shared Integer? i; // non-variable, thus implicitly immutable
  shared variable Integer j; // variable, assumed volatile
}

class B satisfies A () {
  ...
  shared Integer? i {
    // no side effects allowed here!
    return j; // error: more volatile expression
    return /*immutable or constant expression*/
  }
  shared variable Integer? j {
    return i; // fine, as anything else you could do here
  }

  void m() {
    if (exists i) {
      // fine, i is known immutable
    }
    if (exists j) {
      // OK, but re-evaluation of j will fire up errors
    }
    if (exists ej = j) {
      // all j will be reduced to ej,
      // or be errors
    }
    concurrent if (exists j) {
      // OK, and re-evaluation of j will actually take place
      // (at the discretion of the programmer)
    }
  }
}

As shown, the intent is to have volatility known at the member and, by implication, expression level. I would not want to see this on the type level (as in c++), as I am not convinced that the compiler is the right tool to establish higher-level concurrency/immutability concepts, such as deeply immutable object graphs or proper locking.

On top of member and known expression volatility, some useful optimizations and state safety features can be implemented. Also, bug checkers could be much more reliable, and potential future ceylons will have much less pain when taking a stab at the issue.

Some things are odd, though:

interface E satisfies A {
  shared immutable variable Integer? i; // allowed (equally volatile), but can never be sensibly implemented
  ...
}

But I'm confident the corner cases can be worked out. I have started to look at the spec, so I'll probably be able to provide more detail quite fast.

CeylonMigrationBot commented 12 years ago

[@ikasiuk] A couple of questions:

CeylonMigrationBot commented 12 years ago

[@simonthum]

What does "no side effects allowed mean"? Is the only allowed statement a return?

Essentially a license to cache. More precisely, this has a caller and a callee side, where I'd say the caller side is what counts. Still, the callee should be checked for conformance (e.g. all returned expressions must not be more volatile than expected, no assignments, ...), but that could be made overrideable to allow for lazy initialization, freezeable state or other well-behaved code.

In summary, "no side effects" means "callee must not assume any side effects to become externally visible", and "caller may assume no side effects to happen".

What exactly do you mean be "re-evaluation of j will fire up errors" and "re-evaluation of j will actually take place (at the discretion of the programmer)"?

By "re-evaluation of j will fire up errors" I mean that if j is not actually in the if body, it may be acceptable that it is volatile. If you invoke some method that needs j but does not get it as a parameter, there isn't much you can do anyway. But as I think about it, perhaps this is worth a warning or error that can be silenced using the concurrent annotation.

By "re-evaluation of j will actually take place (at the discretion of the programmer)" I mean the concurrent annotation will tell ceylon not to shadow, and not to check for volatility. Instead it trusts the programmer to know his stuff. The concurrent annotation serves as a reminder that the condition isn't as as reliable as one might want it to. (For risk-savy programmers, this annotation could even have an optional doc string)

How do you have a non-variable attribute if you want the old, currently implemented behavior (no setter, but arbitrary getters allowed)?

interface A {
  shared volatile Integer? j; // not variable, but may vary at any time
}

Here, an implementing getter may do anything and is expected to do so. Subclasses may specify setters, of course.

CeylonMigrationBot commented 12 years ago

[@simonthum] To follow-up on this, I made a prototypical implementation:

https://github.com/simonthum/ceylon-spec/tree/immutable2

The results are a bit mixed. The good news is that there were no major roadblocks. It took me some time but it's mostly due to my spare time being very limited.

I think it fits well into the language, you can see the necessary adaptations in the tests. All in all it seems a good fit to the current language design. I guess a proper impl. would take about double the LOC, as tracking locals is missing so far.

Features:

The drawback is that I think I did not strike a good balance in terms of complexity and gain. I guess there are two more realistic options:

  1. Make it simpler: More or less drop 'local' volatility, and let people override the checks on getters when they feel like it.
  2. Blow it up: More checks for side-effects, draw in constructors and methods, and thoroughly check the state stability but do not allow for overriding volatility.

The current design's main drawback is that e.g. the Object.string getter cannot be made immutable if it has to call methods, even if these are free of externally visible side-effects (think StringBuilder). It's quite hard to write immutable getters if they're not trivial, and harder to check them properly ;(

Anyway, even if you don't like the idea this is perhaps worth looking at as it cleans up the spec more than I had anticipated.

If someone wants to give it a try, be informed that it's missing some trivial additions/alternations to the language module I'm too lazy to push unless beaten to.

CeylonMigrationBot commented 12 years ago

[@ikasiuk] Perhaps @gavinking wants to have a look at this?

CeylonMigrationBot commented 12 years ago

[@gavinking] > Perhaps @gavinking wants to have a look at this?

FTR, the only reason I have not commented on this so far is it's the sort of thing that takes time to digest and do justice. Certainly not something I have a bottled response for. (I'm also waiting to see other people's reactions.)

CeylonMigrationBot commented 12 years ago

[@quintesse] And my reason has been that I'm not very interested in having language-level support for volatility and concurrency. I'm betting on frameworks (actors, whatever) to handle concurrency (in an admittedly less flexible way). Personally I don't mind if such a framework uses Java low-level concurrency features while not exposing them to Ceylon. But I won't stand in the way of people who think that Ceylon really needs this. :)

CeylonMigrationBot commented 12 years ago

[@simonthum] Let me get this straight: It's an experiment, more driven by curiosity than expectations. I don't intend to push this if there is no community support, I have neither the resources nor the masochism for that. Still, some feedback would be nice.

Then, it's not really thought of as concurrency support, from my side at least. It's more about getting basics right and making code easier to check. And yes, concurrency libraries or more advanced language features might use such a feature to their advantage.