cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.07k stars 4.27k forks source link

Modernize (use of) Random Number Generators #40636

Open VinInn opened 1 year ago

VinInn commented 1 year ago

CMS currently uses CLHEP: The CLHEP project was proposed by Leif Lönnblad at CHEP 92 It has same age of CMS...

The number of issues are pretty large and the adaptation of the library to new "user cases" has made the situation just worse...

Let's take the Poisson generator heavily used in the Digitization of HCAL: https://cmssdt.cern.ch/dxr/CMSSW/source/SimCalorimetry/HcalSimAlgos/src/HcalSiPMHitResponse.cc#160 it calls RandPoissonQ

This is a recent igprof http://innocent.web.cern.ch/innocent/perfResults/igprof-navigator/digiDebug1T_hcal4_CMSSW_13_0_X_2023-01-26-2300_slc7_amd64_gcc11/55

one can use the "old" RandPoisson that would most probably be faster if thread local was NOT used http://innocent.web.cern.ch/innocent/perfResults/igprof-navigator/digiDebug1T_oldPoisson_CMSSW_13_0_X_2023-01-26-2300_slc7_amd64_gcc11/53

one can try to use std::poisson_distribution [1] http://innocent.web.cern.ch/innocent/perfResults/igprof-navigator/digiDebug1T_stdPois_CMSSW_13_0_X_2023-01-28-1100_slc7_amd64_gcc11/116 BUT CLHEP engine returns only double, float or unsigned int while std engine returns uint32 or uint64 and generate_canonical will require two invocation to clhep unsigned int to generate a double so is twice too slow

I have then made my own implementation (from old clhep RandPoisson small average case) [2] http://innocent.web.cern.ch/innocent/perfResults/igprof-navigator/digiDebug1T_trivialPoisson_CMSSW_13_0_X_2023-01-28-1100_slc7_amd64_gcc11/226 and it is as fast as we expect... (no wasted call to exp, no thread local, full use of all bits of MixMax) 4x speedup!

So either 1) CLHEP implementation is changed 2) CLHEP is made consistent with std::random 3) CMS implements her own random library picking the best of all

[1]

[diff --git a/SimCalorimetry/HcalSimAlgos/src/HcalSiPMHitResponse.cc b/SimCalorimetry/HcalSimAlgos/src/HcalSiPMHitResponse.cc
index 492821da6d4..0603922bf80 100644
--- a/SimCalorimetry/HcalSimAlgos/src/HcalSiPMHitResponse.cc
+++ b/SimCalorimetry/HcalSimAlgos/src/HcalSiPMHitResponse.cc
@@ -12,6 +12,26 @@
 #include "CalibCalorimetry/HcalAlgos/interface/HcalPulseShapes.h"

 #include "CLHEP/Random/RandPoissonQ.h"
+#include <random>
+#include <limits>
+
+namespace {
+
+  class RandomGeneratorWrapper {
+  public:
+    using result_type = unsigned int;
+
+    RandomGeneratorWrapper(CLHEP::HepRandomEngine* engine) : m_engine(engine) {}
+
+    result_type operator()() { return (unsigned int)(*m_engine); }
+
+    static result_type min() { return std::numeric_limits<result_type>::min(); }
+    static result_type max() { return std::numeric_limits<result_type>::max(); }
+
+    CLHEP::HepRandomEngine* m_engine;
+  };
+
+}  // namespace

 #include <cmath>
 #include <list>
@@ -156,8 +176,13 @@ void HcalSiPMHitResponse::addPEnoise(CLHEP::HepRandomEngine* engine) {
     int nPreciseBins = nbins * getReadoutFrameSize(id);

     unsigned int sumnoisePE(0);
+
+    RandomGeneratorWrapper eng(engine);
+    std::poisson_distribution<> poisson(dc_pe_avg);
+
     for (int tprecise(0); tprecise < nPreciseBins; ++tprecise) {
-      int noisepe = CLHEP::RandPoissonQ::shoot(engine, dc_pe_avg);  // add dark current noise
+      int noisepe = poisson(eng);  // add dark current noise
+      //      int noisepe = CLHEP::RandPoissonQ::shoot(engine, dc_pe_avg);  // add dark current noise

       if (noisepe > 0) {
         if (precisionTimedPhotons.find(id) == precisionTimedPhotons.end()) {

[2]

diff --git a/SimCalorimetry/HcalSimAlgos/src/HcalSiPMHitResponse.cc b/SimCalorimetry/HcalSimAlgos/src/HcalSiPMHitResponse.cc
index 492821da6d4..b815a19ab81 100644
--- a/SimCalorimetry/HcalSimAlgos/src/HcalSiPMHitResponse.cc
+++ b/SimCalorimetry/HcalSimAlgos/src/HcalSiPMHitResponse.cc
@@ -11,7 +11,37 @@
 #include "FWCore/Utilities/interface/isFinite.h"
 #include "CalibCalorimetry/HcalAlgos/interface/HcalPulseShapes.h"

-#include "CLHEP/Random/RandPoissonQ.h"
+#include "CLHEP/Random/RandPoisson.h"
+
+
+namespace {
+
+class TrivialPoisson {
+public:
+
+  TrivialPoisson(CLHEP::HepRandomEngine* engine, double av) : m_engine(engine), m_expAvm(std::exp(-av)){}
+
+  int operator()() {
+    int em = -1;
+    double t = 1.0;
+    do {
+      em += 1;
+      t *= m_engine->flat();
+    } while( t > m_expAvm);
+    return em;
+  }
+
+
+  private:
+  CLHEP::HepRandomEngine* m_engine;
+  const double m_expAvm;
+
+};
+
+
+}
+
+

 #include <cmath>
 #include <list>
@@ -155,9 +185,11 @@ void HcalSiPMHitResponse::addPEnoise(CLHEP::HepRandomEngine* engine) {

     int nPreciseBins = nbins * getReadoutFrameSize(id);

+    TrivialPoisson poisson(engine, dc_pe_avg);
     unsigned int sumnoisePE(0);
     for (int tprecise(0); tprecise < nPreciseBins; ++tprecise) {
-      int noisepe = CLHEP::RandPoissonQ::shoot(engine, dc_pe_avg);  // add dark current noise
+      //int noisepe = CLHEP::RandPoissonQ::shoot(engine, dc_pe_avg);  // add dark current noise
+      int noisepe = poisson();  // add dark current noise

       if (noisepe > 0) {
         if (precisionTimedPhotons.find(id) == precisionTimedPhotons.end()) {
[innocent@patatrack02 src]$
cmsbuild commented 1 year ago

A new Issue was created by @VinInn Vincenzo Innocente.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

Dr15Jones commented 1 year ago

assign simulation, generators

cmsbuild commented 1 year ago

New categories assigned: generators,simulation

@mdhildreth,@mkirsano,@menglu21,@alberto-sanchez,@SiewYan,@GurpreetSinghChahal,@Saptaparna,@civanch you have been requested to review this Pull request/Issue and eventually sign? Thanks

civanch commented 1 year ago

@VinInn , I am not sure if G. Cosmo and J. Apostolakis can access this issue in the web. It somehow should be sent to them. CMSSW may migrate to the new Poisson with new CLHEP version. If the new implementation is really fast and thread safe, then I do not think there will be any opposition. It is the best solution, because CMSSW will migrate to it only changing external package. If you agree, I will send them your proposal or you send yourself.

Saying that CLHEP is 30 years old is not fully true: the interface is old, the code for most parts was updated many times to be modernized, random generators are very new and are nearly advanced, which was confirmed by random number benchmarks - there is a long story behind. Why not simply to update CLHEP Poisson?

Of course, it is possible implementing CMS util for Poisson, which will use internally CLHEP generator. This may be applied in several sensitive DIGI classes until RandPoissonQ will be modernized but this will require change CMSSW classes introducing a bit of the mess.

Concerning proposal 3) - I would avoid creation of a new CMS library for random generators - there are so many configuration and other issues, for example, significant work will be needed to be sure that the new generator is really random and all different CMS job start from correct random seed.

VinInn commented 1 year ago

@civanch : the real problem IS the CLHEP Interface. 1) the engine does not export the type, min,max, and a method that returns the real bytes generated 1a) the virtual interface adds some complication as well but can be solved if one stick to a fixed number of generated bytes.... (of course not the minimal common denominator!) 2) The cdf generator (gauss, poisson etc) interface is "static" while in many cases the implementation requires a state (possibly computed in advance) 2a) this led to suboptimal use and internally use of thread_local variables.

btw I do not intend to make a new "engine" just reuse those available (std, clhep) and provide a more modern interface (and implementation)

For what concern Poisson specifically: in case mu is known in advance (essentially at configuration time) a precomputed table approach is most probably the fastest. I will work on this for HCAL (the average is 5.5e-05 : the poisson result is essentially always 0)

civanch commented 1 year ago

@VinInn , I agree that concret Poisson generator will be always faster than general. If it would be possible to use clhep for flat random inside this CMS class it would be a clean solution.

Concerning interface change in CLHEP: at least, this can be proposed.

Poisson is also used in SIM step for hit production, as well as Gauss. The MixMax generator dominates and it takes ~1.3% of ttbar simulation. Gauss takes ~0.3% , Poisson is not seen in profiler dump. The relative role of these generators should be more significant at DIGI but for today we do not have a profile. We have to do.

VinInn commented 1 year ago

@civanch

The relative role of these generators should be more significant at DIGI but for today we do not have a profile. We have to do.

This is what I'm doing at the moment! Have already submitted 3 PRs (one merged) and the one fixing poisson will be the fourth...

makortel commented 1 year ago

I am not sure if G. Cosmo and J. Apostolakis can access this issue in the web. It somehow should be sent to them.

Our GitHub is public so anyone can read and comment.

civanch commented 1 year ago

I had a chat with colleagues. Comments from them:

in CLHEP there're three Poisson implementations:

In old measurements RandPoissonQ was faster than RandPoisson.

Current Geant4 default (not CLHEP default but following its interface) is G4Poisson, which is less accurate then RandPoissonQ.

The recommendation is to try G4Poisson or implement the most effective algorithm within CMSSW.

VinInn commented 1 year ago

IMHO in digitization "good enough" suffice. I do not think that a detailed simulation of tails is of the outmost relevance (Poisson and gaussian tails are actually unphysical do to time and space limits)

VinInn commented 1 year ago

RandPoissonT is just a wrapper around RandPoissonQ The problem is that even if one would instantiate an object and use the fire method the class caches the mean not exp(-x). A lost opportunity.... ditto for G4Poisson that is just an inlined function (stateless)

VinInn commented 1 year ago

let me share here another opportunity to speedup Normal (aka Gaussian) Random number generation. in [1] a prototype

I use the classical Box-Muller algorithm in single precision float. vdt is used as math library. The speed up w/r/t CLHEP comes from two components: 1) I use one 53bit random number to generate two floats 2) the Box-Muller transformation is fully vectorized

igprof (on a AMD zen3) results current release (sse) http://innocent.web.cern.ch/innocent/perfResults/igprof-navigator/digiDebug1T_ori_CMSSW_13_0_SKYLAKEAVX512_X_2023-01-29-2300_el8_amd64_gcc11/61 new code sse: speed up 2.9x http://innocent.web.cern.ch/innocent/perfResults/igprof-navigator/digiDebug1T_fast_CMSSW_13_0_SKYLAKEAVX512_X_2023-01-29-2300_el8_amd64_gcc11/161 new code avx2: speed up 3.6 http://innocent.web.cern.ch/innocent/perfResults/igprof-navigator/digiDebug1T_fastAVX2_CMSSW_13_0_SKYLAKEAVX512_X_2023-01-29-2300_el8_amd64_gcc11/215

Of course one has to accept that the number of different generated numbers is "limited" (few 10^7) and the max sigma is about 6.6. Given that the starting point is 53bits (actually more if one uses the original uint64 generated) one can extend both a bit using some well established "tricks".

It should be noted that the routine in question SiGaussianTailNoiseAdder::addNoiseVR computes in float so in any case it does not exploit the full dynamic range of double precision....

[1] https://github.com/VinInn/cmssw/blob/a14f9ed707d6b93a79ee67f11c71fcdf9e81925e/SimTracker/SiStripDigitizer/src/SiGaussianTailNoiseAdder.cc