Closed milianw closed 3 years ago
The macos test failure looks unrelated to me (some python breakage?)
Yeah, the macOS test fails on master as well, since I added some tests for the new NPY file reading and writing. It looks to me like some sort of configuration issue, the tests work fine on my own Mac.
Regarding dip::SetNumberOfThreads
: I'm not sure I understand this. Because apparently you cannot call DIPlib functions that use OpenMP from within OpenMP threads, it doesn't make sense to call dip::SetNumberOfThreads
from a thread either. You should call it before your #pragma omp
.
Or is this related to calling DIPlib from multiple threads that are not created by OpenMP? Does that work OK?
Yeah, the macOS test fails on master as well, since I added some tests for the new NPY file reading and writing. It looks to me like some sort of configuration issue, the tests work fine on my own Mac.
OK, I'll ignore that then.
Regarding
dip::SetNumberOfThreads
: I'm not sure I understand this. Because apparently you cannot call DIPlib functions that use OpenMP from within OpenMP threads, it doesn't make sense to calldip::SetNumberOfThreads
from a thread either. You should call it before your#pragma omp
.Or is this related to calling DIPlib from multiple threads that are not created by OpenMP? Does that work OK?
Yes, the way our application is setup is the following:
Now the patch I'm proposing is required for the following situation:
Without making the thread number a thread-local, this would be a data race at the very least. Clearly we don't want thread B to influence thread A here.
And to re-iterate: Yes, it is clear to us that we are hopelessly oversubscribing here. The problem is simply that we don't have any alternative, as all long-running tasks needs to be run in a background thread.
I've included your three commits with some changes: 9ddbf97b6deae20419ccd9fc6b1c2e4e7ab304c1, e0df9e18da3be020edbcc355d403344c14809b18, 1c3a86e056b731e3529b249e55d9068b89980ccb
Many thanks!
One of the changes I made was related to the documentation for the dip::SetNumberOfThreads
function. The thread-local variable is initialized to the default value for every thread created, so setting the value and then creating a new thread causes the new thread to have the default value, not the value picked earlier. I think this is important to note, as it was not obvious to me that it would behave this way (I had no experience with thread_local
before!).
This means that, in your OpenMP-parallelized function, you should do:
#pragma omp parallel
{
dip::SetNumberOfThreads(1);
#pragma omp for
for (...) {
...
}
}
Or something similar. Can't set the number of threads to 1 before the parallel section starts, as this spawns threads that will have the default value for the thread-local variable.
One of the changes I made was related to the documentation for the
dip::SetNumberOfThreads
function. The thread-local variable is initialized to the default value for every thread created, so setting the value and then creating a new thread causes the new thread to have the default value, not the value picked earlier.
Yes, that is correct and is imo the desired behavior in most cases. Unless there is nested OpenMP parallelization within diplib itself? But from what I've seen, that is not the case, or?
I think this is important to note, as it was not obvious to me that it would behave this way (I had no experience with
thread_local
before!).This means that, in your OpenMP-parallelized function, you should do:
#pragma omp parallel { dip::SetNumberOfThreads(1); #pragma omp for for (...) { ... } }
I opted for this snippet, which works the way I expect it to:
#pragma omp parallel for
for (...) {
auto previousNumberOfThreads = dip::GetNumberOfThreads();
dip::SetNumberOfThreads(1);
...
dip::SetNumberOfThreads(previousNumberOfThreads);
}
But obviously, it would be better to use a RAII handler for that purpose, to make it exception safe. I can upstream that utility too, if you want.
Or something similar. Can't set the number of threads to 1 before the parallel section starts, as this spawns threads that will have the default value for the thread-local variable.
Yes, correct.
Hey Cris,
here are some more little fixes I would like to upstream.
Many thanks with the hint about the call to
dip::SetNumberOfThreads
- that indeed fixes my problem! I just realize that it would negatively influence other threads (and introduce a datarace) if I'd call it from our code. This patch here should fix that and sounds like a generally good thing to do I believe, as the OpenMP thread pool is thread-local too.Cheers