beehive-lab / TornadoVM

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages
https://www.tornadovm.org
Apache License 2.0
1.19k stars 114 forks source link

TornadoExecutor deadlock. #490

Open andrii0lomakin opened 4 months ago

andrii0lomakin commented 4 months ago

Good day. After some non-intensive load, my tests were deadlocked. I have made a thread dump and debugged a bit and have found the following: Thread dump https://gist.github.com/andrii0lomakin/9c84a7b629e7017ac2d84b76b1428ea0 .

So the situation is the following: TornadoExecutorThread - 0 (0 for clarity) -> (waits for in lookup in future.get) -> 1 (this thread is blocked by synchronized by 3) -> 3 (blocks 1 and 2 using synchronized) .

Thread 3 waits for the execution of a new future, which can not be executed because all threads are blocked.

So, it is not classic deadlock per se, but nonetheless, everything is deadlocked. Usage of inter-thread gets I personally perceive as not a good practice.

What do you think about:

  1. Usage of CompletableFutures instead?
  2. Or at least use a cached executor with a fixed amount of core threads instead? Though the last item is not a solution per se IMHO because in case of intensive workloads, it will allow to avoid such concrete situation for the cost of throughput, but:
    • does not guarantee the absence of deadlocks with a similar pattern.
    • slow throughput is just a bit better than complete deadlock.

P.S. Just increasing the number of threads is not a solution IMHO, though it worked in my case.

stratika commented 4 months ago

hi @andrii0lomakin, are your tests part of the TornadoVM unit-tests? If not, can you please provide a test-case to understand better if the way you execute a TaskGraph is valid? It may be a case that we have not encountered/provisioned so far. In this case, it will be very helpful to have a small example in order to design the functionality as a new feature.

Also, please report the system information (OS, devices) that you are running your tests.

andrii0lomakin commented 3 months ago

@stratika I will try to reproduce it again. It was on Windows 11/PTX.