Closed hchagani-oasislmf closed 2 years ago
gprof
output after determination of coverage ID
moved to a parent loop:
$ gprof summarycalc gmon.out | head
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
44.12 34.67 34.67 3029707501 0.00 0.00 summarycalc::processsummarysets(int, int, float)
23.71 53.30 18.63 3029707501 0.00 0.00 summarycalc::processsummaryset(int, int, int, float)
10.99 61.93 8.63 1 8.63 77.59 summarycalc::dosummaryprocessing(int)
9.63 69.50 7.57 2999707501 0.00 0.00 summarycalc::dosummary(int, int, int, int, int, float, float)
5.05 73.47 3.97 3059710501 0.00 0.00 std::vector<int, std::allocator<int> >::operator[](unsigned long)
gprof
output after restricting outputs requested by the user (in this case one output):
$ gprof summarycalc gmon.out | head
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
14.11 15.52 15.52 3029707501 0.00 0.00 summarycalc::processsummaryset(int, int, int, float)
12.55 29.32 13.80 3029707501 0.00 0.00 summarycalc::processsummarysets(int, int, float)
10.58 40.96 11.64 4884161212 0.00 0.00 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&)
9.13 51.00 10.04 1824453697 0.00 0.00 std::vector<int, std::allocator<int> >::end()
7.98 59.77 8.77 1824447707 0.00 0.00 bool __gnu_cxx::operator!=<int*, std::vector<int, std::allocator<int> > >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > const&, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > const&)
An improvement of 20% was witnessed, the majority of which took place after the determination of coverage ID
was moved to a parent loop.
Issue Description
From the output of
gprof
on an unoptimisedsummarycalc
binary run on a 24 GBgulcalc
output file with 100 samples over 10,000 events, the code spends the majority of time executing code from three functions:Closer inspection of
summarycalc::processsummarysets()
andsummarycalc::processsummaryset()
indicates that an improvement in performance should be possible:std::vector<int> item_to_coverage_
is repeatedly checked in loops within the aforementioned functions. Should the size be greater than 0, i.e. the items file exists and has been read in successfully, thecoverage ID
is obtained from the vector. Otherwise, theitem/output ID
is used. Thecoverage ID
only needs to be obtained when theevent ID
changes. Therefore, this check and the determination ofcoverage ID
can be moved to a parent loop to save unnecessary processing.summarycalc
. All these outputs are looped over and then a check is performed to determine whether they have been requested by the user. If the check passes, output is generated. It would be better to restrict the loop to those outputs requested by the user, which are known at run time. A single summary level output was requested during the aforementioned test run.Version / Environment information
Tested on ktools v3.9.2.