GalSim-developers / GalSim

The modular galaxy image simulation toolkit. Documentation:
http://galsim-developers.github.io/GalSim/
Other
224 stars 105 forks source link

Implement OpenMP for random deviate generate functions #1177

Closed rmjarvis closed 2 years ago

rmjarvis commented 2 years ago

This is mostly an efficiency PR to speed up random number generation on large arrays.

When working on this imSim PR involving raining lots of photons through a sensor object, I was finding that with the OpenMP stuff for the sensor, the pure random number generation was taking almost half the total clock time. The generate calls were running on a single thread, and then the accumulate calls were being well parallelized. My laptop can run 10 threads, so it made a big difference.

This PR enables OpenMP parallelization for the various kinds of RNG generate functions that we have. In practice, it's mostly the generate call, but I did all of the various flavors.

Some RNG types (most notably Poisson) don't reliably use 1 rng per generated value. This causes problems for the way I am doing the parallelism, so I disable it for these types. We don't want the random number sequences to be dependent on how many threads are being used, and I don't see a different way that would be safe for them. There's also some trickiness regarding GaussianDeviates, since they use up 2 rngs per 2 values, so odd/even array lengths matter for them.

The upshot is that the photon-based flat field generation went from taking 11 minutes per section down to 9.6 minutes, so 15% faster overall. But more importantly, if you have a big beefy machine with lots of cores, it will scale better with the number of cores, rather than letting most of them sit idle half the time. On my laptop, it was maintaining a load of about 7, so still not 100% parallelism, but a lot better than previously. (I forgot to take note of it, but I think it was more like 4 previously.)

I also moved the single_threaded context that was in config/util.py into utilities.py to make it a bit more accessible. We should highlight this helper class in the release notes for people who don't want any openmp stuff, due to already fully parallelizing with processes or other similar things. But for most use cases, I think this change should be an improvement.

rmjarvis commented 2 years ago

Thanks James! I'll leave this open for a few days in case anyone else wants to take a look.

cwwalter commented 2 years ago

Will this give us a reproducible sequence even if different cores are used in a different order etc.?

Currently, we were able to always generate the exact same images using the rng sequences in imSim.

rmjarvis commented 2 years ago

Will this give us a reproducible sequence even if different cores are used in a different order etc.?

Yes. There are unit tests testing this, which didn't break with this change.