KristofferC / TimerOutputs.jl

Formatted output of timed sections in Julia
Other
653 stars 54 forks source link

print_timer: show minima instead of avg, and switch off totals and/or %tot ? #160

Open ahbarnett opened 1 year ago

ahbarnett commented 1 year ago

I am enjoying your package, since it is simple to use (and one does not spend a long time waiting as with @btime which takes >1sec even if a short total is requested...). The usage to(myfunc)(my,args) makes the timing code particularly lightweight :)

An obvious and simple feature request would be an option in print_timer to replace the "avg" column with "min" (this is like @btime which reports the min). This is less sensitive to compilation on the first sample, or to interrupts.

I imagine people would also like to switch off the "time" and "%tot" columsn, and just focus on the "min" or "avg". Maybe I didn't notice if there are options already? Thanks! Alex

KristofferC commented 1 year ago

More customization for the table makes sense. Perhaps one should be able to explicitly give all the columns one wants to see which would allow a larger spread of choices.

mmesiti commented 10 months ago

In my own fork I have hacked the code to output this:

────────────────────────────────────────────────────────────────────────────────────────────────────────
                                                    Time                                Allocations      
                               ───────────────────────────────────────────────   ────────────────────────
       Tot / % measured:                        12.4s /  97.3%                        441MiB /  80.8%    

 Section               ncalls     time    %tot     avg     min     max     std     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────────────────────────
 total solver               1    12.0s  100.0%   12.0s   12.0s   12.0s    NaNh    357MiB  100.0%   357MiB
   getDeriv!               40    8.64s   71.7%   216ms   204ms   244ms  8.63ms   42.1MiB   11.8%  1.05MiB
     getXBubble!           40    8.36s   69.4%   209ms   197ms   237ms  8.59ms   29.4MiB    8.2%   752KiB
       partition           40    7.91s   65.6%   198ms   187ms   225ms  7.05ms   27.6MiB    7.8%   708KiB
       communication       40    452ms    3.8%  11.3ms  9.41ms  44.7ms  6.05ms   1.26MiB    0.4%  32.3KiB
       get_ranges          40   2.63ms    0.0%  65.7μs  58.1μs  74.3μs  3.10μs    479KiB    0.1%  12.0KiB
     workspace             40    114ms    0.9%  2.85ms  2.59ms  3.12ms   143μs   2.95KiB    0.0%    75.6B
       setZero - X         40   79.5ms    0.7%  1.99ms  1.73ms  2.18ms  95.4μs      672B    0.0%    16.8B
       setZero - Deriv     40   33.7ms    0.3%   844μs   730μs   947μs  71.8μs     0.00B    0.0%    0.00B
       basic constr...     40   30.2μs    0.0%   755ns   594ns  1.03μs   131ns     0.00B    0.0%    0.00B
     symmetrizeBubble!     40   62.9ms    0.5%  1.57ms  1.03ms  3.19ms   607μs   3.69MiB    1.0%  94.4KiB
     addToVertexFro...     40   53.3ms    0.4%  1.33ms  1.22ms  1.76ms   116μs   1.86MiB    0.5%  47.6KiB
     get_Self_Energy!      40   22.5ms    0.2%   563μs   356μs  1.01ms   198μs   4.89MiB    1.4%   125KiB
     symmetrizeVertex!     40   17.1ms    0.1%   428μs   273μs   503μs  67.7μs   2.32MiB    0.7%  59.4KiB
     getDFint!             40    120μs    0.0%  3.00μs  2.48μs  3.44μs   274ns     0.00B    0.0%    0.00B
     setup                 40   61.2μs    0.0%  1.53μs   967ns  2.32μs   309ns     0.00B    0.0%    0.00B
   get_observables        178    481ms    4.0%  2.70ms  2.24ms  3.85ms   408μs   9.34MiB    2.6%  53.7KiB
   bareOutput               9    322ms    2.7%  35.8ms  2.69μs   118ms  53.8ms   6.00MiB    1.7%   682KiB
 ────────────────────────────────────────────────────────────────────────────────────────────────────────

And honestly it looks pretty easy to do. I am not making a pull request because there are some things that can went wrong during development and the test suite did not catch them. The problem I had with computing std requires to take squares of time measurements, but times are in ns and represented as Int64, so if something lasts for like ~10s you definitely have overflows). I would have added a test for this but honestly I don't know where to add it in runtest.jl, it looks pretty disorganized to me, there is no obvious place where to add a test for this for example.

JoshuaLampert commented 10 months ago

Similar here. In my fork, I also computed the median by using OnlineStats.jl.