Azure / azure-relay-java

Azure Relay Java SDK
MIT License
8 stars 9 forks source link

Suspected visibility issues in the highly mt code #65

Open consultantleon opened 4 years ago

consultantleon commented 4 years ago

While analysing some issue we currently face (cause yet unknown), I'd like to make some notes about suspected 'visibility' issues accross this API library. When one thread updates a field it cannot be guaranteed that another thread sees the same value unless the field is declared volatile or the writing thread and the reading thread both lock on the same lock/monitor. With this thought in mind I looked at fields which are often updated while locking on thisLock but thisLock is not locked while reading: there's no guarantee the reader sees the latest update of the field. Please consider this general comment and review the code + run it through standard code analysis tools. My examples of suspicion are: HybridConnectionListener line 521 reading: CompletableFuture controlConnectionTask = this.controlConnection.connectAsyncTask;

[ this method seems only to exisit for the purpose of unit testing?? ]

Maybe more serious issue: HybridConnectionListener ->ControlConnection line 545: private boolean closeCalled; (not volatile declared?!)

And on line 809 it's accessed outside of thisLock: if (!webSocket.isOpen()) { this.closeOrAbortWebSocketAsync(connectTask, webSocket.getCloseReason()); if (this.closeCalled) { keepGoing = false; }

At this point assuming this code is executed by another thread than the one setting this.closeCalled, it's possible that this thread does not see this.closeCalled set to true and will continue retrying to reconnect (keepGoing true). This is currently a suspicion that we have why the listener continues to reconnect when we try to close it!

A much better practise would be to use AtomicBoolean, AtomicReference or other java.concurrent objects for any 'state' that is shared between threads (looking at the code any field is a suspect with the extensive use of CompletableFuture's / multi threading!). The concurrent objects would also allow more efficient get/set and increment operations.

One last example I suspect: HybridConnectionListener -> ControlConnection Line 543: private int connectDelayIndex; (not volatile, consider AtomicInteger!!)

On line 699 this int is accessed outside of a lock and its value could be non consistent , i.e. assuming the wrong connection delay: private CompletableFuture connectAsync(Duration timeout) { try { this.listener.throwIfDisposed();

            CompletableFuture<Void> delayTask = CompletableFutureUtil.delayAsync(RelayConstants.CONNECTION_DELAY_INTERVALS[this.connectDelayIndex], EXECUTOR);

(note that it's always updated under control of thisLock! However when reading the lock should also be taken to ensure consistent read and prevent visibility issues!)

jfggdl commented 4 years ago

@consultantleon, Thank you for raising this issue. We will have an internal discussion about your suggestions and problem and we will get back to you. If you already have some improvements/code, feel free to create a PR and we can work collaboratively on this.

consultantleon commented 4 years ago

Thanks Javier, let me try to create one next week. I made a number of small changes and fixed a bug regarding error handling on disconnect. I suggest you scan the project with sonar, there's a number of issues raised in Intellij, I'll try to get a sonar run over our fork. Regards, Leon

On 28 Feb 2020, 20:56, at 20:56, Javier Fernandez notifications@github.com wrote:

@consultantleon, Thank you for raising this issue. We will have an internal discussion about your suggestions and problem and we will get back to you. If you already have some improvements/code, feel free to create a PR and we can work collaboratively on this.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/Azure/azure-relay-java/issues/65#issuecomment-592701814