OasisLMF / ktools

In-memory simulation kernel for loss modelling.
BSD 3-Clause "New" or "Revised" License
28 stars 19 forks source link

summarycalc performance improvements #316

Closed hchagani-oasislmf closed 2 years ago

hchagani-oasislmf commented 2 years ago

Issue Description

From the output of gprof on an unoptimised summarycalc binary run on a 24 GB gulcalc output file with 100 samples over 10,000 events, the code spends the majority of time executing code from three functions:

$ gprof summarycalc gmon.out | head
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 38.32     62.01    62.01 3029707501     0.00     0.00  summarycalc::processsummarysets(int, int, float)
 28.64    108.35    46.34 262312986     0.00     0.00  std::vector<int, std::allocator<int> >::size() const
 12.36    128.35    20.00 3029707501     0.00     0.00  summarycalc::processsummaryset(int, int, int, float)
  5.86    137.83     9.48        1     9.48   155.70  summarycalc::dosummaryprocessing(int)
  5.40    146.56     8.73 2999707501     0.00     0.00  summarycalc::dosummary(int, int, int, int, float, float)

Closer inspection of summarycalc::processsummarysets() and summarycalc::processsummaryset() indicates that an improvement in performance should be possible:

  1. The size of std::vector<int> item_to_coverage_ is repeatedly checked in loops within the aforementioned functions. Should the size be greater than 0, i.e. the items file exists and has been read in successfully, the coverage ID is obtained from the vector. Otherwise, the item/output ID is used. The coverage ID only needs to be obtained when the event ID changes. Therefore, this check and the determination of coverage ID can be moved to a parent loop to save unnecessary processing.
  2. Up to 10 summary level outputs can be generated by summarycalc. All these outputs are looped over and then a check is performed to determine whether they have been requested by the user. If the check passes, output is generated. It would be better to restrict the loop to those outputs requested by the user, which are known at run time. A single summary level output was requested during the aforementioned test run.

Version / Environment information

Tested on ktools v3.9.2.

hchagani-oasislmf commented 2 years ago

gprof output after determination of coverage ID moved to a parent loop:

$ gprof summarycalc gmon.out | head
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 44.12     34.67    34.67 3029707501     0.00     0.00  summarycalc::processsummarysets(int, int, float)
 23.71     53.30    18.63 3029707501     0.00     0.00  summarycalc::processsummaryset(int, int, int, float)
 10.99     61.93     8.63        1     8.63    77.59  summarycalc::dosummaryprocessing(int)
  9.63     69.50     7.57 2999707501     0.00     0.00  summarycalc::dosummary(int, int, int, int, int, float, float)
  5.05     73.47     3.97 3059710501     0.00     0.00  std::vector<int, std::allocator<int> >::operator[](unsigned long)

gprof output after restricting outputs requested by the user (in this case one output):

$ gprof summarycalc gmon.out | head
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 14.11     15.52    15.52 3029707501     0.00     0.00  summarycalc::processsummaryset(int, int, int, float)
 12.55     29.32    13.80 3029707501     0.00     0.00  summarycalc::processsummarysets(int, int, float)
 10.58     40.96    11.64 4884161212     0.00     0.00  __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&)
  9.13     51.00    10.04 1824453697     0.00     0.00  std::vector<int, std::allocator<int> >::end()
  7.98     59.77     8.77 1824447707     0.00     0.00  bool __gnu_cxx::operator!=<int*, std::vector<int, std::allocator<int> > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > const&, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > const&)

An improvement of 20% was witnessed, the majority of which took place after the determination of coverage ID was moved to a parent loop.