eclipse-archived / ceylon

The Ceylon compiler, language module, and command line tools
http://ceylon-lang.org
Apache License 2.0
398 stars 62 forks source link

Proposal for automated thread safety #4459

Open CeylonMigrationBot opened 9 years ago

CeylonMigrationBot commented 9 years ago

[@tkaitchuck] This is a more formal version of discussion here: https://groups.google.com/forum/#!topic/ceylon-users/rbs2srcIp_Y

Implementing automated thread safety in Ceylon can be done if we add a few features to the compiler.

The main thing that is needed is to make a pass over the AST in ‘bottom up’ dependency order. So that the compiler can add annotations to classes and observe the annotations it added when it is processing classes that depend on them.

The following compiler generated annotations are needed. They can be inferred by the compiler for all but native methods.

These above situations would be detected and the appropriate annotations added. Then thread safety could be enforced via the following rules:

Finally at codegen time code can be injected to ensure safety by adding synchronization to shared methods on mutable objects.

This is guaranteed to be thread safe because:

It’s easy to imagine lots of enhancements and optimizations to this:

I believe I can write the analysis to add the annotations fairly easily, if there were support for adding annotations to the code during the PhasedUnit processing. Is adding support for that easy? After that the code generation for Java should be straight forward, but I haven't really looked into it. I don't know enough about JavaScript to make an intelligent guess about how its code generation works.

Does this sound like a good idea? Should there be more annotations? As opposed to the original proposal I have tried to make this as easy to understand for the user as possible. Hence none are required, however it could be made more flexible, if more information were provided by the developer.

Alternately, does anyone feel strongly that we should avoid synchronization all together? IE: Strictly sending messages / sharing immutable data between threads.

[Migrated from ceylon/ceylon-spec#1353]

CeylonMigrationBot commented 9 years ago

[@gavinking] @tkaitchuck what I would like to see here is some canonical examples (patterns) of the sorts of concurrent code that people would like to write, and how they can be written in Ceylon using this approach.

Also interesting: examples of things that people would like to be able to write, but that would not be considered well-typed according to your rules (false positives).

CeylonMigrationBot commented 9 years ago

[@tkaitchuck] There are two ways of doing things concurrently, shared state or no shared state. If you are doing data parallel work with no shared state, great, you're done. The language doesn’t need to provide anything for you. If you want execute on a cluster or something, you’ll want something like hadoop, but the language itself doesn’t need to do much.

In terms of shared state there are basically two patterns

  1. Handoff/transfer - There are a lot of higher level patterns here. We need to build things like queues/channels, Futures, etc. This may make creating those easier, but there still needs to be a way block a thread, which this does not provide.
  2. Have multiple threads execute concurrently that use this data but not break one another’s logic. This is where this proposal is targeted.

I have been developing following practices similar to these for years professionally, and I can say it works well in practice. Because it isolates the safety issues so you never need to consider more than one class at a time. This makes it easy for a human to review, as they do not need to see the broader context to verify things are correct. (It’s also what makes it automatable)

The basic idea is that every class assumes that everything it calls is thread safe, but may have been modified in between calls, unless it is held locally in which case that does not happen. Then by synchronizing over each method modifying state, you can regard your functions as atomic without having to think about it.

To give an idea of where this causes you to have to do things differently, I’m going to go over the following sample of files from Herd and explain what would need to be done to get them to work in this scheme. (I hope this is a good and balanced sample. However if there are more you want me to look at please say so.) https://github.com/ceylon/ceylon-sdk/blob/master/source/ceylon/collection/HashMap.ceylon https://github.com/ceylon/ceylon-sdk/blob/master/source/ceylon/collection/LinkedList.ceylon https://github.com/ceylon/ceylon-sdk/blob/master/source/ceylon/io/impl/SocketImpl.ceylon https://github.com/ceylon/ceylon-sdk/blob/master/source/ceylon/io/base64/base64.ceylon https://github.com/ceylon/ceylon-sdk/blob/master/source/ceylon/json/Printer.ceylon https://github.com/ceylon/ceylon-sdk/blob/master/source/ceylon/math/whole/WholeImpl.ceylon https://github.com/ceylon/ceylon-sdk/blob/master/source/ceylon/net/http/client/Parser.ceylon https://github.com/ceylon/ceylon-sdk/blob/master/source/ceylon/net/http/Header.ceylon https://github.com/ceylon/ceylon-sdk/blob/master/source/ceylon/net/http/server/MatcherRule.ceylon https://github.com/ceylon/ceylon-sdk/blob/master/source/ceylon/logging/Logger.ceylon https://github.com/ceylon/ceylon-sdk/blob/master/source/ceylon/promise/Deferred.ceylon

Starting with the most complex:

LinkedList and HashMap

LinkedList is mutable and dealing with passed in data, which may or not be mutable. On the surface this is fine, because the value being passed in is simply being stored. However methods like firstOccurrence, contains, replace, and remove take in an argument which they are going to call equals on. Why this is a problem is not as obvious for LinkedList but is more obvious for HashMap, where the method removeEntry takes a key and a value. In HashMaps it is generally assumed that the key is immutable. After all what does it even mean to be get the hash of an item if it is going to change? Changing a key after insertion would break hashMap even in a single threaded context. It is desirable for the values to be mutable. (A map of lists for instance.) Java’s HashMap works under these assumptions. One can simply wrap it in a Collections.synchronizedMap and you are good to go. Ceylon’s removeEntry method breaks this. The method is much more useful in a threaded context so I assume that is its intended purpose. However because it is calling the equals method on the passed in item, if it is mutable thread safety cannot be guaranteed any more than than it could be if the key were mutable and being changed in another thread. This scheme would disallow this method, and prevent this bug.

The fix to this is simple. The method signature would need to be changed from: Boolean removeEntry(Key key, Item&Object item) to Boolean removeEntry(Key key, Item&Immutable item). Which is to say you simply may not call this method if your Item type is mutable. Similarly the methods mentioned in LinkedList that call equals would need to take Element&Immutable. If it were fixed by just making that one change it may annoy anyone who is operating in a single threaded context. But it does show that perhaps the interface should be considered more carefully.

The real problem here is that Java has confounded === and ==. Where it defines a equals and hashCode on Object, and has them Identity based, which would cause them to work fine for mutable items, even as keys in HashMaps. But then it encourages everyone to override them with an equality based comparison, which only makes sense for immutable data. You might still want to have a deep equals, but there is no way to do that besides overriding equals, which requires you define a HashCode function in a way that should never be used.

If it were up to me equals and hashCode would not be on object at all. They should be on some other interface, say Immutable. In this case they could even be a mixin and you wouldn’t have to redefine them on over again on every single object.

If this were done, you would need to indicate to methods like removeEntry which comparison to use, but it would be up to the code creating or using collection. As opposed to hard-coding knowledge of how it is going to be used into the item.

Finally in LinkedList’s find method or HashMap’s each method, a function is being passed in. This would be flagged with two ways to resolve it. Either by making the function pure, which is fine for find but doesn’t make sense for each. So each would have to be implemented as a ‘wrapping’ method which gets an iterator and invokes the provided function on each element. This might be a bit verbose, but it would be possible to simplify with additional annotations if we cared enough.

SocketImpl

The only change is Java imports need to be annotated as blocking. Same goes for OpenFileImpl.ceylon

Base64

No changes at all.

Json Printer

You would want to create an object which contains level, enter, leave, and indent. (Adds one line of code and a closing brace.)

This is actually a common pattern that will come up a lot. It is helpful to encapsulate state into an object to track it, and have the methods associated with it there, rather than mixed in with the rest of the logic of the class. IMHO this makes code a lot cleaner.

WholeImpl

The imported Java class BigInteger would need to be annotated as being immutable.

Parser

Similar to Json printer this class would benefit from having its state moved into a nested object.

Alternately the state could be eliminated altogether. The member variables are merely used by private methods calling each other. The class itself is not actually stateful. The methods could just as easily pass the variables as arguments.

Header

No changes.

MatcherRule

No changes.

Logger

No changes.

Deferred

This is the one I consider most impressive. This requires no changes. I looked at several of the files in ceylon/promise and they are all the same.

CeylonMigrationBot commented 9 years ago

[@tkaitchuck] To more directly answer your second question, it is going to flag things that you write, and think are ok because you know how they are going to be used, but would not be OK if the user of the class was particularly perverse. IE: You know you aren't going to pass in mutable objects in as the key to a HashMap, and you know you aren't going to call equals on two lists that contain each other. But the compiler can't say you aren't going to do those things, so you have to write differently.

As sort of an informal proof that this doesn't fundamentally break anything: Imagine you have two classes A and B. You want to have them reference each other and/or call methods in some way this disallows. It is always legal to create some class C that holds onto both A and B and calls whatever methods it likes. Because C in not locally mutable it is completely unrestricted. It is always possible to make things work by re-arranging things a bit. In practice this manifests as shrinking A and B a lot but having classes C and D which wrap them respectively and interact. Ceylon's ability to declare an object directly really would work nicely with this.

CeylonMigrationBot commented 9 years ago

[@sadmac7000] I wonder how this acts with link boundaries. I'm not sure our module-resolving semantics protect us from, say, linking a later version of a library where immutable objects have become mutable. That's where I forsee an issue: this looks like total program analysis, which is something we've historically said we can't do.

CeylonMigrationBot commented 9 years ago

[@tkaitchuck] @sadmac7000 Let's say you are using a class from another module as a key in a TreeMap. How is the possibility of it being replaced at runtime with a different jar which makes the object mutable any more problem then replacing it with a version that does not implement comparable.

CeylonMigrationBot commented 9 years ago

[@sadmac7000] @tkaitchuck because the annotation is implicit, so the module writer doesn't notice that he just broke his ABI.

CeylonMigrationBot commented 9 years ago

[@sadmac7000] @gavinking consider how your reasons for us not having a 'pure' annotation apply to this implicit 'immutable' annotation.

CeylonMigrationBot commented 9 years ago

[@tkaitchuck] @sadmac7000 The author might not notice they broke you either way. Ignoring this scheme for a moment. If you make an object mutable and have one of the methods that used to not change it change it in a way that changes the result of its compareTo of hashCode, it will break people. Even even now and in a single thread. Changing the semantics of an object like that IS breaking. What this offers is if you build the same code you deploy, you catch it at compile time. You don't get that now.

CeylonMigrationBot commented 9 years ago

[@sadmac7000] This is against the design of ceylon. Link-significant properties of the code are captured in explicit code. Always.

CeylonMigrationBot commented 9 years ago

[@sadmac7000] The obvious fix would be to have explicit annotations and let the compiler pass simply verify their correctness.

CeylonMigrationBot commented 9 years ago

[@tkaitchuck] If you want to force things not to change, then you need to have explicit annotations for Immutable and Blocking. The other two can still be inferred as they pertain only to the class itself. This wouldn't be bad at all. The only problem is, as @tombentley pointed out here: #4457 is that practically every shared method becomes: shared blocking ...

CeylonMigrationBot commented 9 years ago

[@sadmac7000] You could have shared imply blocking, and let nonblocking be the alternative.

CeylonMigrationBot commented 9 years ago

[@tkaitchuck] I'm not sure that is less typing. Plus it's weirder. Would you really want to see that added to every method in collections? It's less odd to see blocking on all the methods in IO.

CeylonMigrationBot commented 9 years ago

[@sadmac7000] You could allow blocking annotations on classes, with a meaning of "methods in here default to blocking unless annotated nonblocking"

CeylonMigrationBot commented 9 years ago

[@tkaitchuck] I started down the path of implementing Immutable as an explicit annotation. My plan is to have the annotation imply that it satisfies the interface Immutable, and validate the class is locally immutable and that all of its members are assignable to an Immutable interface.

This looks straightforward, except I am not sure how to handle classes with a generic parameter that would be immutable if their parameter is immutable and not if they aren't. AFAIK there is no way to conditionally satisfy an interface.

I could just say, these aren't annotated and can't be assumed to be immutable. However this would be really unfortunate for sequences. It seems wrong to not be able to declare a class as Immutable if it has a member that is a {String*}, a [String, String], or a String->Integer. I could special case theses builtin types, but that seams like a hack.

Is there a better way?

CeylonMigrationBot commented 9 years ago

[@jvasileff] @tkaitchuck, a minor point: {String*} means Iterable<String> which is often mutable, whereas [String*] is Sequential<String>, which is immutable.

Conditional inheritance is discussed in #3922.

CeylonMigrationBot commented 9 years ago

[@tkaitchuck] For now, I am doing this: shared interface Immutable {} // All the BasicTypes satisfy this. shared alias BasicImmutableTypes => Immutable|Nothing|Null|[]; shared alias ImmutableMask => BasicImmutableTypes | BasicImmutableTypes[] | Entry<BasicImmutableTypes,BasicImmutableTypes>; Where I check against ImmutableMask after the type checking is all done.

This should be enough to get a prototype working, and at least validate the concept.

CeylonMigrationBot commented 9 years ago

[@tkaitchuck] I have a forked set of repositories where the code analysis portion is 'working'. https://github.com/tkaitchuck?tab=repositories

It does not handle everything yet. Certain ways to pass functions and inheritance in some instances etc. And it does not inject any code, it just determines if it would compile. Blocking is done via an annotation. Classes with mutable variables need to be annotated. (Hopefully this can be removed) And you can declare you intend a class to be deeply immutable by having it satisfy the 'Immutable' interface.
Obviously this is very early, but it is enough to play around with. I was able to compile ceylon-walkthrough with just adding the mutable annotation to the example mutable object.