ginkgo-project / ginkgo

Numerical linear algebra software package
https://ginkgo-project.github.io/
BSD 3-Clause "New" or "Revised" License
398 stars 87 forks source link

Thread-safety of Ginkgo #996

Open upsj opened 2 years ago

upsj commented 2 years ago

In the discussion on #993, but also some previous issues like #740 and #565, as well as future plans like #805, there is an underlying thing we have not yet talked about before, and that is what thread-safety guarantees different parts of Ginkgo can give.

I could imagine different levels of thread-safety guarantees we can give

  1. We don't give any guarantees, i.e. each thread needs to use its own executor and independent objects. This means we would only need to make some guarantees on things like cross-executor copies, but otherwise should have no issues. While easiest, I think this might be a bit overly restrictive?
  2. We promise that Executor itself is thread-safe (which it should be already currently), but require different threads to not use the same object at the same time, e.g. two threads can't call apply on a solver at the same time. This would be relevant when we want to decide how to deal with DenseCache types that avoid reallocations by storing a mutable Dense object internally.
  3. Everything inside Ginkgo can be used in a thread-safe fashion, except for concurrent modification of data. This would mean that we need to make sure that a DenseCache provides a different thread-local object for each thread in something like a mutable std::unordered_map<std::thread::id, std::unique_ptr<Dense<...>>>. We would then need to make sure write accesses to the cache are mutually exclusive, but could otherwise provide pretty strong guarantees.

The whole area of temporary storage looks to me like the most important/difficult to handle part, but can you imagine any other areas that could have issues with thread-safety?

Slaedr commented 2 years ago

@tcojean How does StarPU manage different tasks? Does it spawn threads?

I feel like the internal complication arising from either options 2 or especially 3 needs good justification. In general, it may be reasonable to expect each host thread to have its own independent executor and objects, or otherwise leave it to the application to synchronize at a very coarse level.

tcojean commented 2 years ago

There are long lived (p)threads on every core. Tasks are just structures which have a code tied to it and data. The tasks are moved to the cores/pthreads by the scheduler and its policy (work stealing or whatever else the user sets).

Slaedr commented 2 years ago

@tcojean I see. Is there a concept of ownership of data among threads that StarPU itself manages? Is that the case in general for tasking systems? If that is the case, perhaps there's no pressing need for us to manage the task-level coarse-grained parallelism. It's good you brought up StarPU, because I think this topic (of host thread safety) needs to be discussed in the context of any possible task runtime integration.

tcojean commented 2 years ago

I'm not sure what you mean, but the data can move between cores/threads depending on the scheduler. Whether there will be concurrent access depends on the code itself as well:

Slaedr commented 2 years ago

In the second case you mentioned, perhaps cached data for the matrix, preconditioner etc. needs to be considered while setting task dependencies.

upsj commented 2 years ago

I guess that is the crux of 2 vs. 3: Does solver->apply(...) have a read/write dependency on itself? With 3. we could avoid that, with 2 we need to consider it.

Just for clarification, I think we already fulfill 2. right now, we just don't codify or check it in any way (manually or in unit tests)