GalSim-developers / GalSim

The modular galaxy image simulation toolkit. Documentation:
http://galsim-developers.github.io/GalSim/
Other
224 stars 105 forks source link

Thread safe C++ library #545

Open dkirkby opened 10 years ago

dkirkby commented 10 years ago

I am rendering images from C++ code that calls the GalSim library (without using python) as part of a shear estimator pipeline. The GalSim rendering step is currently the bottleneck but could be easily parallelized across threads if GalSim were thread safe.

I have added some mutex locks in FFT.cpp and Table.cpp that seem to make GalSim thread safe in the limited way in which I am using it, but it would be great to do this more comprehensively and carefully for future applications like this.

rmjarvis commented 10 years ago

I knew about the non-thread-safeness of FFT.cpp. Specifically the fftw calls were known to not be thread safe, so the mutex locks there are probably right. I'm curious though what you found you needed to do for Table.cpp. I would have thought that would already be ok.

dkirkby commented 10 years ago

I added a mutex guard in Table::setup() to avoid this crash:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x00000002288080e0
[Switching to process 15425 thread 0x2703]
0x00007fff9271dba9 in memmove$VARIANT$sse42 ()
(gdb) where
#0  0x00007fff9271dba9 in memmove$VARIANT$sse42 ()
#1  0x00000001018b588a in std::vector<double, std::allocator<double> >::_M_fill_insert ()
#2  0x00000001018c3918 in galsim::Table<double, double>::setupSpline ()
#3  0x00000001018c3fb3 in galsim::Table<double, double>::setup ()
#4  0x00000001018c47ad in galsim::Table<double, double>::operator() ()
#5  0x000000010185c580 in galsim::SersicInfo::kValue ()
#6  0x000000010185648b in galsim::SBSersic::SBSersicImpl::fillKValue ()
#7  0x000000010189e0d8 in galsim::SBTransform::SBTransformImpl::fillKValue ()
#8  0x0000000101897681 in galsim::SBConvolve::SBConvolveImpl::fillKValue ()
#9  0x000000010189dc7b in galsim::SBTransform::SBTransformImpl::fillKValue ()
#10 0x00000001018258e6 in galsim::SBProfile::SBProfileImpl::fillKGrid ()
#11 0x00000001018277fc in galsim::SBProfile::fourierDraw<float> ()
#12 0x000000010182809f in galsim::SBProfile::draw<float> ()

Although my code runs multithreaded now, I'm still not sure if it is correct since GalSim is not always giving identical results, although the differences seem quite small. This might be due to differences in how fftw accumulates its "wisdom" when plans are executed in different orders, but let me know if you can think of any other pieces of GalSim that might explain this or need mutexing.

rmjarvis commented 10 years ago

I see Yes, each thread would be using the same SericInfo object, so that setup could be racing. In fact, you probably want to move that mutex lock up a couple layers to the SersicInfo::buildFT() function. I think you probably only want to let one thread do all that. It looks like letting multiple threads do that simultaneously would cause problems. Maybe not a crash, but it might explain the differences you were seeing.

I don't think the fftw wisdom stuff is supposed to change the answer (al least not more than at the level of epsilon kinds of difference). I think it is just learning which of several correct algorithms would be fastest for a given set of data. So I don't think that should be a problem.

It would be worth looking through the other functions in detail to look for more setup stuff like this where we might wan to add some locks.

dkirkby commented 10 years ago

I moved the mutex lock up to SersicInfo::buildFT() and the pixels are still changing between runs with multiple threads.

rmjarvis commented 10 years ago

Hm. There are probably similar things in SersicInfo that also need locks. e.g. stepK() looks like it wouldn't race, but it would be possible for one thread to overwrite R after another one modified it, so you could get different values of _stepk in different runs. Similar issue with calculateHLR().

Some others classes have this kind of thing too. (Are you using any others besides Sersic?) Exponential, Kolmogorov, and Airy all use similar Info classes to calculate things that will be of common use.

Even more if you let multiple threads use the same SBProfile object. e.g. SBMoffat has some setup stuff that would be a problem if multiple threads had the same object and it tried to do the setup in two threads simultaneously.

I'd probably need to go through each file with an eye towards thread safety to see what else needs locks. There might be more that aren't occurring to me off hand.

dkirkby commented 10 years ago

I modified my code so that each thread build its own SBProfile objects, but things just got worse. Now SersicInfo::kValue() is throwing a Table has repeated arguments. exception at this location:

#4  0x00000001018c3b23 in galsim::Table<double, double>::setup ()
#5  0x00000001018c403d in galsim::Table<double, double>::operator() ()
#6  0x000000010185c544 in galsim::SersicInfo::kValue ()
#7  0x000000010185630b in galsim::SBSersic::SBSersicImpl::fillKValue ()
#8  0x000000010189d978 in galsim::SBTransform::SBTransformImpl::fillKValue ()
#9  0x0000000101896f21 in galsim::SBConvolve::SBConvolveImpl::fillKValue ()
#10 0x000000010189d51b in galsim::SBTransform::SBTransformImpl::fillKValue ()
#11 0x0000000101825156 in galsim::SBProfile::SBProfileImpl::fillKGrid ()
#12 0x000000010182706c in galsim::SBProfile::fourierDraw<float> ()
#13 0x000000010182790f in galsim::SBProfile::draw<float> ()
dkirkby commented 10 years ago

I modified LRUCache::get to always return a new value, thus completely disabling the SersicInfo cache, which finally gives the same results with different numbers of threads. This is not a good long-term solution but the performance impact seems minimal for my application. The alternative of letting Sersic profiles in different threads share the same cached SersicInfo looks like it would need fairly extensive changes for thread safety, at least in the current design.