Open stevejgordon opened 2 months ago
It looks like the diagnosers hook into the HostSignal.BeforeActualRun
and HostSignal.AfterActualRun
events. MemoryDiagnoser
, ThreadingDiagnoser
, and ExceptionDiagnoser
are currently hard-coded into the Engine.GetExtraStats()
which runs immediately after HostSignal.AfterActualRun
. Perhaps we could add another hook point for diagnosers to hook into the GetExtraStats
method (and provide a way for them to get the totalOperationsCount
to divide the results), and simultaneously decouple those built-in diagnosers.
I agree it would be nice to have both scenarios supported 👍 There is sampling to consider I guess. Depending on workload under benchmark a CPU tracer might not yield very interesting results for individual operations. Is dotMemory sampling as well?
I also tried the new dotMemory diagnoser in my latest benchmarking session and also found the current implementation not ideal in a scenario where there is an IterationSetup, which then is also included in the dotMemory runs. It not so easy to select the time range of a single Iteration precisely, so I would also wish for an option that only tracks one iteration. Maybe creating snapshots for each iteration might also be an option.
My second wish for future improvements would be to have an option that allows to enable full allocation tracking (instead of sampled). I would prefer to run a precise analysis in some cases.
I recently experimented with the new JetBrains diagnosers. I love the concept. However, I was surprised by how they are implemented. Right now, they attach before the
WorkloadActual
and detach after. This means they record all operations, which may be in the millions.This makes their information useful but hard to utilise in my typical case. Most often, I benchmark first, and then, if I need to figure out where the saving in allocations I can potentially make, I run dotMemory over the same code. This inner loop is a little slow. I was expecting that the diagnosers would perform one invocation of the benchmark method so that the results specifically show those allocations. With the current behaviour, I have to scale things down by the number of operations. It also results in larger dotTrace and dotMemory files. Is it possible to limit the number of operations that these diagnosers analyse?
cc @AndreyAkinshin and @martinothamar