DIPlib / diplib

Quantitative Image Analysis in C++, MATLAB and Python
https://diplib.org
Apache License 2.0
228 stars 50 forks source link

Wip/upstreaming #70

Closed milianw closed 3 years ago

milianw commented 3 years ago

Hey Cris,

here are some more little fixes I would like to upstream.

Many thanks with the hint about the call to dip::SetNumberOfThreads - that indeed fixes my problem! I just realize that it would negatively influence other threads (and introduce a datarace) if I'd call it from our code. This patch here should fix that and sounds like a generally good thing to do I believe, as the OpenMP thread pool is thread-local too.

Cheers

milianw commented 3 years ago

The macos test failure looks unrelated to me (some python breakage?)

crisluengo commented 3 years ago

Yeah, the macOS test fails on master as well, since I added some tests for the new NPY file reading and writing. It looks to me like some sort of configuration issue, the tests work fine on my own Mac.

Regarding dip::SetNumberOfThreads: I'm not sure I understand this. Because apparently you cannot call DIPlib functions that use OpenMP from within OpenMP threads, it doesn't make sense to call dip::SetNumberOfThreads from a thread either. You should call it before your #pragma omp.

Or is this related to calling DIPlib from multiple threads that are not created by OpenMP? Does that work OK?

milianw commented 3 years ago

Yeah, the macOS test fails on master as well, since I added some tests for the new NPY file reading and writing. It looks to me like some sort of configuration issue, the tests work fine on my own Mac.

OK, I'll ignore that then.

Regarding dip::SetNumberOfThreads: I'm not sure I understand this. Because apparently you cannot call DIPlib functions that use OpenMP from within OpenMP threads, it doesn't make sense to call dip::SetNumberOfThreads from a thread either. You should call it before your #pragma omp.

Or is this related to calling DIPlib from multiple threads that are not created by OpenMP? Does that work OK?

Yes, the way our application is setup is the following:

Now the patch I'm proposing is required for the following situation:

Without making the thread number a thread-local, this would be a data race at the very least. Clearly we don't want thread B to influence thread A here.

And to re-iterate: Yes, it is clear to us that we are hopelessly oversubscribing here. The problem is simply that we don't have any alternative, as all long-running tasks needs to be run in a background thread.

crisluengo commented 3 years ago

I've included your three commits with some changes: 9ddbf97b6deae20419ccd9fc6b1c2e4e7ab304c1, e0df9e18da3be020edbcc355d403344c14809b18, 1c3a86e056b731e3529b249e55d9068b89980ccb

Many thanks!

crisluengo commented 3 years ago

One of the changes I made was related to the documentation for the dip::SetNumberOfThreads function. The thread-local variable is initialized to the default value for every thread created, so setting the value and then creating a new thread causes the new thread to have the default value, not the value picked earlier. I think this is important to note, as it was not obvious to me that it would behave this way (I had no experience with thread_local before!).

This means that, in your OpenMP-parallelized function, you should do:

#pragma omp parallel
{
   dip::SetNumberOfThreads(1);
   #pragma omp for
   for (...) {
      ...
   }
}

Or something similar. Can't set the number of threads to 1 before the parallel section starts, as this spawns threads that will have the default value for the thread-local variable.

milianw commented 3 years ago

One of the changes I made was related to the documentation for the dip::SetNumberOfThreads function. The thread-local variable is initialized to the default value for every thread created, so setting the value and then creating a new thread causes the new thread to have the default value, not the value picked earlier.

Yes, that is correct and is imo the desired behavior in most cases. Unless there is nested OpenMP parallelization within diplib itself? But from what I've seen, that is not the case, or?

I think this is important to note, as it was not obvious to me that it would behave this way (I had no experience with thread_local before!).

This means that, in your OpenMP-parallelized function, you should do:

#pragma omp parallel
{
   dip::SetNumberOfThreads(1);
   #pragma omp for
   for (...) {
      ...
   }
}

I opted for this snippet, which works the way I expect it to:

#pragma omp parallel for
for (...) {
    auto previousNumberOfThreads = dip::GetNumberOfThreads();
    dip::SetNumberOfThreads(1);
    ...
    dip::SetNumberOfThreads(previousNumberOfThreads);
}

But obviously, it would be better to use a RAII handler for that purpose, to make it exception safe. I can upstream that utility too, if you want.

Or something similar. Can't set the number of threads to 1 before the parallel section starts, as this spawns threads that will have the default value for the thread-local variable.

Yes, correct.