dendibakh / perf-book

The book "Performance Analysis and Tuning on Modern CPU"
https://book.easyperf.net/perf_book
Creative Commons Zero v1.0 Universal
2k stars 144 forks source link

First draft of section on Tracy #19

Closed theWatchmen closed 12 months ago

theWatchmen commented 1 year ago

@dendibakh I haven't built the book yet to check the layout. Just wanted to make sure you were happy with the content first :) Let me know if it needs more details.

dendibakh commented 1 year ago

@theWatchmen, @wolfpld please correct me if I'm wrong. (I'm not a game developer, I have never used Tracy.) I see the value of using Tracy in the fact that it is specialized for a certain class of applications, while tools like Vtune, UProf, Apple Instruments are general-purpose profilers. Tracy uses the same sampling technique but can reveal additional data (like spotting anomalies in frame rendering, etc.) If yes, then we should focus on the benefits of using Tracy over Vtune in this section, i.e. why someone should pick up Tracy versus a "go-to" tool like Vtune.

@theWatchmen, should we rename this section as something like "Specialized profilers"?

wolfpld commented 1 year ago

Tracy is used for many applications other than making games. Examples include software for cloud streaming, medical microscopy, CAD, military simulation, and so on. Well-known companies outside the gaming industry that use Tracy include NVIDIA, Adobe, Amazon, Mozilla, TomTom, Netflix, Boston Dynamics, and CERN.

I do not agree with the statement that Tracy is specialized and VTune is for general use or a "go-to" solution. This seems to me to be heavily biased by what you are already familiar with. I see VTune as a highly specialized profiler for the following reasons:

In terms of sampling profiling, Tracy is both worse (because you can't attach to or launch an external process, or because the sampling is hard to find due to UI evolution) and better (for many reasons, for instance, see https://youtu.be/_hU7vw00MZ4?t=699 for a direct comparison to VTune) than the competition.

The main difference is that Tracy is also an instrumentation-based profiler. The main thing about working with this type of data, as opposed to sampling, is that you can clearly see when your otherwise fast function is occasionally slow. You can then drill down into what is happening in those rare cases to identify the cause.

dendibakh commented 1 year ago

Ok, maybe I'm wrong and Tracy is a "go-to" tool for profiling. Don't want to debate as I haven't seen the data on what people usually use. This is a book about tuning SW for modern CPUs, so at the center is CPU performance, not system performance. We need insights into CPU microarchitecture. But let me ask you @wolfpld, do you position Tracy as a competitor and a superior product to Vtune? This is an honest question, I know Vtune has flaws and in general, I'm a big proponent of open-source software. Don't get me wrong, I'm just genuinely trying to understand this. And I wish only the best for Tracy. BTW, on AMD systems you would use UProf, not Vtune, on Apple platforms you would use Instruments. All those products are not perfect, but they have a specific purpose.

wolfpld commented 1 year ago

This is a book about tuning SW for modern CPUs, so at the center is CPU performance, not system performance. We need insights into CPU microarchitecture.

You cannot look at the performance of your application without taking into account everything else that is going on in the system. See https://youtu.be/uJkrFgriuOo?t=72

But let me ask you @wolfpld, do you position Tracy as a competitor and a superior product to Vtune?

I had no need to use VTune for a long time, because to me Tracy is superior. I don't follow how they develop the product, so it's hard for me to say anything about it. It certainly has its uses, and if you want to take a deep dive into various specialized perf counters, VTune is the only way to go.

BTW, on AMD systems you would use UProf

I could not use uProf at all. Any profiling session longer than a few seconds would lock the UI.

dendibakh commented 1 year ago

In general, I agree, system performance is important even if you're developing a single-threaded compute bound application with no I/O. Because in the real world, OS can schedule another compute bound app on a sibling HW thread and that will ruin performance of your app. But again, this book is not about SW dynamics. There is already a good book for that by Dick Sites. It's good to understand other impacts, but this book is about addressing low-level performance. This book talks about memory layout optimizations, vectorization, branch mispredictions, etc. When you're working on that, you need an isolated environment, where the issue is clearly visible.

theWatchmen commented 1 year ago

@theWatchmen, should we rename this section as something like "Specialized profilers"?

I was thinking something like "Manual Instrumentation Profilers" or simply "Instrumentation Profilers" might be a better description.

dendibakh commented 1 year ago

@theWatchmen, should we rename this section as something like "Specialized profilers"?

I was thinking something like "Manual Instrumentation Profilers" or simply "Instrumentation Profilers" might be a better description.

Just want to make sure I understand it correctly: zone statistics (call counts, time, histogram) are exact because Tracy traces EVERY zone entry/exit, but system-level data and source-code-level data are sampled. If we use the title "Instrumentation profilers", will it confuse people into thinking that Tracy instruments all the code (recursively) inside a zone? While it's just a marker API, that is recognized by Tracy.

theWatchmen commented 1 year ago

I was thinking something like "Manual Instrumentation Profilers" or simply "Instrumentation Profilers" might be a better description.

Just want to make sure I understand it correctly: zone statistics (call counts, time, histogram) are exact because Tracy traces EVERY zone entry/exit, but system-level data and source-code-level data are sampled. If we use the title "Instrumentation profilers", will it confuse people into thinking that Tracy instruments all the code (recursively) inside a zone? While it's just a marker API, that is recognized by Tracy.

That's correct, zone statistics are reported exactly while non-instrumented and system code are sampled. Not sure what describes it best: Manual Code Instrumentation? Manual Code Markers?

dendibakh commented 1 year ago

Thanks Marco, I will review your updates in a couple of days. BTW, I plan a new section for marker APIs (in progress), which showcases libpfm, which is not as fancy as Tracy, it's a wrapper around Linux perf_events. https://github.com/dendibakh/perf-book/pull/20 Since the concept of marker APIs is already covered, maybe we should position Tracy as a specialized/hybrid profiler and focus on its features for game developers for example?

theWatchmen commented 1 year ago

Since the concept of marker APIs is already covered, maybe we should position Tracy as a specialized/hybrid profiler and focus on its features for game developers for example?

Sure, that sounds good.

dendibakh commented 1 year ago

@theWatchmen, I made a few edits, mostly cosmetic. I also left [TODO] comments in the text, please address.

dendibakh commented 1 year ago

@theWatchmen , are you waiting for my review? I thought you were still working on it. :)

theWatchmen commented 1 year ago

Hey Denis, the last commit addresses your comments about artificially adding a slow zone and show that in the chapter. Sorry, the commit message doesn't make that clear. I don't have further changes for now :)

theWatchmen commented 1 year ago

Hey Denis, apologies for the delay in getting back to you! I should have addressed all the latest comment and I have updated the images :) Let me know if you have any further notes.

dendibakh commented 1 year ago

Cool! Thanks Marco, it looks great. Leave it to me. I will make some cosmetic changes and show you the final version for review. One question: on "Figure 50: Tracy frame time view", which bar corresponds to frame 101? Maybe draw an arrow... Also, the timeline on that image looks confusing to me... What do those numbers mean? :) image

P.S. Marco, can you please provide details of the machine you used? CPU, OS, compiler.

dendibakh commented 1 year ago

I've made some changes. @theWatchmen, please review. The rendered PDF book is here: https://github.com/dendibakh/perf-book/actions/runs/5557247288?pr=19 P.S. Sorry for the forced push. I rebased on top of the current main branch.

theWatchmen commented 12 months ago

One question: on "Figure 50: Tracy frame time view", which bar corresponds to frame 101? Maybe draw an arrow...

It's for the last bar in the graph. I have added a green rectangle to highlight it.

Also, the timeline on that image looks confusing to me... What do those numbers mean? :)

Those numbers are for the timeline below. I'll remove it.

P.S. Marco, can you please provide details of the machine you used? CPU, OS, compiler.

Done.

theWatchmen commented 12 months ago

I read the section on the PDF version, looks good to me :)

dendibakh commented 12 months ago

I read the section on the PDF version, looks good to me :)

Awesome! Thanks, merging it...