summarycalc performance improvements

OasisLMF / ktools

In-memory simulation kernel for loss modelling.

BSD 3-Clause "New" or "Revised" License

28 stars 19 forks source link

Issue Description

From the output of gprof on an unoptimised summarycalc binary run on a 24 GB gulcalc output file with 100 samples over 10,000 events, the code spends the majority of time executing code from three functions:

$ gprof summarycalc gmon.out | head
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 38.32     62.01    62.01 3029707501     0.00     0.00  summarycalc::processsummarysets(int, int, float)
 28.64    108.35    46.34 262312986     0.00     0.00  std::vector<int, std::allocator<int> >::size() const
 12.36    128.35    20.00 3029707501     0.00     0.00  summarycalc::processsummaryset(int, int, int, float)
  5.86    137.83     9.48        1     9.48   155.70  summarycalc::dosummaryprocessing(int)
  5.40    146.56     8.73 2999707501     0.00     0.00  summarycalc::dosummary(int, int, int, int, float, float)

Closer inspection of summarycalc::processsummarysets() and summarycalc::processsummaryset() indicates that an improvement in performance should be possible:

The size of std::vector<int> item_to_coverage_ is repeatedly checked in loops within the aforementioned functions. Should the size be greater than 0, i.e. the items file exists and has been read in successfully, the coverage ID is obtained from the vector. Otherwise, the item/output ID is used. The coverage ID only needs to be obtained when the event ID changes. Therefore, this check and the determination of coverage ID can be moved to a parent loop to save unnecessary processing.
Up to 10 summary level outputs can be generated by summarycalc. All these outputs are looped over and then a check is performed to determine whether they have been requested by the user. If the check passes, output is generated. It would be better to restrict the loop to those outputs requested by the user, which are known at run time. A single summary level output was requested during the aforementioned test run.

Version / Environment information

Tested on ktools v3.9.2.

$ gprof summarycalc gmon.out | head Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 44.12 34.67 34.67 3029707501 0.00 0.00 summarycalc::processsummarysets(int, int, float) 23.71 53.30 18.63 3029707501 0.00 0.00 summarycalc::processsummaryset(int, int, int, float) 10.99 61.93 8.63 1 8.63 77.59 summarycalc::dosummaryprocessing(int) 9.63 69.50 7.57 2999707501 0.00 0.00 summarycalc::dosummary(int, int, int, int, int, float, float) 5.05 73.47 3.97 3059710501 0.00 0.00 std::vector<int, std::allocator<int> >::operator[](unsigned long)

$ gprof summarycalc gmon.out | head Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 14.11 15.52 15.52 3029707501 0.00 0.00 summarycalc::processsummaryset(int, int, int, float) 12.55 29.32 13.80 3029707501 0.00 0.00 summarycalc::processsummarysets(int, int, float) 10.58 40.96 11.64 4884161212 0.00 0.00 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&) 9.13 51.00 10.04 1824453697 0.00 0.00 std::vector<int, std::allocator<int> >::end() 7.98 59.77 8.77 1824447707 0.00 0.00 bool __gnu_cxx::operator!=<int*, std::vector<int, std::allocator<int> > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > const&, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > const&)

OasisLMF / ktools

summarycalc performance improvements #316

Issue Description

Version / Environment information