clojure-goes-fast / clj-async-profiler

Embedded high-precision Clojure profiler
http://clojure-goes-fast.com/kb/profiling/clj-async-profiler/
402 stars 17 forks source link

hot spots #15

Closed xificurC closed 4 years ago

xificurC commented 4 years ago

Imagine a flame graph like

|-X-|----Y----|---X---|--X--|-X-|
|-----A1------|---C1--|--D1-|-E1|
|-----A-------|---C---|--D--|-E-|

We see most of the time was spent in A and there in Y, however after a closer look one might see X is the method where most of the CPU time is being spent. With an arbitrarily complex call tree it gets extremely complex to find such culprits in the flame graph.

What I'm trying to say is that there's several ways to look at the results:

Is there a way to obtain the latter results? In the underlying async-profiler I see something similar being printed to the JVM's console output, maybe giving access to that would be a start?

alexander-yakushev commented 4 years ago

Hello Peter! I totally understand what you are talking about. It is quite often that the culprit hides in multiple flames, and digging them out requires a bit of effort. But there are not one or two, but even three ways to do it.

  1. If you know X precisely, you can use flamegraph's Search feature. Searching for X will not just highlight X stacks in the graph, but will also display their total width percentage in the lower right corner.

  2. You can pass :reverse? true option when creating a flamegraph to make the "upside down" graph. The flames will grow from the lowest frames of the stack towards the stack entrance. This is useful to uncover a common low-level operation (e.g., a hashmap access) that is evenly spread across many different code paths.

  3. The most flexible solution, but requiring the most effort, is to use :transform option. It allows you to change each stack however you like using regular string-processing functions (e.g., regexes) before the flamegraph is generated. You can use it to trim and shave graphs, splice recursive stacks, merge separate code paths into one, etc. You can find some examples here: http://clojure-goes-fast.com/blog/clj-async-profiler-tips/#stack-transforming. In your case, this is ideal if you know and can enumerate all base stacks that precede X to make them disappear, and all different Xs to be rendered as a whole.

Hope this helps!

xificurC commented 4 years ago

Hi Alex,

thanks for your tips. I'm familiar with the :transform option. I was mainly talking about an exploratory situation where you don't know what you're looking for, i.e. X is not known. There are e.g. times when a helper function or a library sits in between your business logic and low-level operations, causing slowdowns. These can become hard to spot in a flamegraph and it's hard to write a generic transformer that would uncover these cases, especially if they sit in the middle of the callstack.

I realized though that the flamegraphs are generated from a simple txt file that contains the stacktrace and # of samples. It's not hard to generate a rudimentary table from that. Here's a sample of where I got in 30 minutes:

(defn st&nm [ln] (let [ix (str/last-index-of ln " ")] [(subs ln 0 ix) (Long/valueOf (subs ln (inc ix)))]))
(defn st->calls [s] (set (str/split (clj-async-profiler.post-processing/demunge-java-clojure-frames s) #";")))
(defn profile->data [sts]
  (loop [mp {} tc 0 sts sts]
    (if-some [[ln & mr] sts]
      (let [[st nm] (st&nm ln)
            cls (st->calls st)]
        (recur (reduce (fn [ac cl] (update ac cl #(if % (+ % nm) nm))) mp cls) (+ tc nm) mr))
      [mp tc])))

(let [[mp tc] (profile->data (line-seq (io/reader "/tmp/clj-async-profiler/results/18-cpu-2020-03-03-14-04-30.txt")))]
  (println "Total calls:" tc)
  (pp/print-table (reduce (fn [ac [mt cn]]
                            (if (re-matches #"(clojure|nrepl)\..*" mt)
                              ac
                              (conj ac {:method mt :count cn :usage-% (-> cn (/ tc) (* 100) long)})))
                          [] (sort-by (comp - second) mp))))

and the output is something like

Total calls: 2060

|                                                                  :method | :count | :usage-% |
|--------------------------------------------------------------------------+--------+----------|
|                                                     java.lang.Thread.run |   1943 |       94 |
|                                              bus.program-impls/eval80541 |   1928 |       93 |
|                                    bus.program-impls/eval80541/fn--80542 |   1928 |       93 |
|          refactor-nrepl.ns.slam.hound.regrow/wrap-clojure-repl/fn--61172 |   1928 |       93 |
|                          bus.program-impls/eval80541/fn--80542/fn--80543 |   1917 |       93 |
|                                                          bus.program/run |   1776 |       86 |
|                                                         bus.program/-run |   1717 |       83 |
|                                               bus.program/-run/fn--79575 |    864 |       41 |
|                                                    bus.iterators/collect |    753 |       36 |
|                        bus.program_impls$_QMARK_map$reify__79865.hasNext |    461 |       22 |
|                                bus.program-impls/-QMARK-map/-next--79859 |    348 |       16 |
|                      bus.program-impls/-QMARK-map/-next--79859/fn--79860 |    327 |       15 |
|                bus.program-impls/eval80541/fn--80542/fn--80543/fn--80544 |    323 |       15 |
|                             bus.iterators$transform$reify__67544.hasNext |    302 |       14 |
|                                bus.iterators$transform$reify__67544.next |    298 |       14 |
|                                                       bus.iterators/take |    295 |       14 |
|                                       bus.program-impls/buffer/fn--79786 |    287 |       13 |
|                                                   bus.program-impls/beat |    252 |       12 |
|                                                            bus.util/get! |    176 |        8 |

This output makes me happy that flamegraphs are the default :) Nevertheless the table output can be helpful in certain scenarios, in addition to the flamegraphs.

Thanks for your time and let me know your thoughts on this!

xificurC commented 4 years ago

Since this isn't really an issue I'm closing. I'm open to further discussions, if there's any interest.