NVIDIA / NVTX

The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.
Apache License 2.0
312 stars 48 forks source link

Rename range classes #40

Closed jrhemstad closed 3 years ago

jrhemstad commented 3 years ago

Currently in NVTX++ we have:

The naming leaves something to be desired. thread_range comes from the concept that the object needs to be created and destroyed on the same thread, but doesn't tell you anything else about it's semantics. process_range is named primarily to contrast it to thread_range in that it can be created/destroyed on different threads.

We can do better. I propose the following:

First, the *_in<D> better conveys that the range is in the particular domain as opposed to calling it a domain_*<D>. (credit to @jcohen-nvidia for this idea).

Second, the scoped_range/unique_range naming is inspired by std::scoped_lock and std::unique_lock.

Like scoped_range, scoped_lock is neither copyable nor movable. It conveys that this object is intrinsically tied to a particular scope.

Like unique_range, unique_lock is movable, but not copyable. It represents a single, unique range that is tied to the lifetime of the object and not confined to a particular scope. Furthermore, the canonical name in C++ for "movable but not copyable" things is unique_*.

Furthermore, just as scoped_lock is typically preferred to unique_lock, scoped_range is preferred to unique_range.

jcohen-nvidia commented 3 years ago

I think it's worth clarifying here that "thread" or "scoped" range objects are trivial RAII wrappers for nvtxRangePush/nvtxRangePop in the NVTX C API, and "process" or "unique" range objects are trivial RAII wrappers for nvtxRangeStart/nvtxRangeEnd.

Note that the choice of "process_range" and "thread_range" was originally based on the behavior of developer tools with hierarchical organization placing the former on a row under a Process node, and the latter under a Thread node. However, the NVTX C API docs don't directly make this distinction that RangeStart/End == process, while they do clarify RangePush/Pop is inherently specific to a CPU thread. Tools no longer make this relationship as stark as they used to -- Nsight Systems detects when both the Start/End calls for a range occur on the same thread, and show it under a Thread node, in addition to ranges that only Start or only End on that thread with visual indication of which end is related.

Ideally there would have just been one kind of range... RangeStart gives you a handle, and you pass that handle to RangeEnd. But since passing things from the beginning of a range to the end can be difficult or inefficient (consider a tool getting a "before" callback and an "after" callback around something), the Push/Pop type was added as an optimization for the case where tools could infer the relationship between the calls. There are measurable perf benefits of using Push/Pop instead of Start/End wherever possible.

The expectation for a modern C++ RAII wrapper around a handle-based kind of range would be that it is movable. So it makes sense to name it something that C++ programmers would recognize as a move-only handle class to an object that needs to be destructed when that one reference to it is dropped. So "nvtx3::unique_range" is a fairly obvious choice, given that it is implemented using std::unique_ptr, and has the same semantics as unique_ptr, unique_lock, unique_resource, etc. Also, while C++20 adds a whole API for "Ranges", these are pairs of iterators in a structure, not really confusable with NVTX ranges of code execution in a program. Even https://en.cppreference.com/w/cpp/algorithm/ranges/unique would be hard to confuse with an nvtx3::unique_range, given that it's an algorithm that rearranges a container to have repeats of values at the end.

The wrapper for Push/Pop ranges doesn't have as clear a choice, but scoped_range is the most familiar to anyone who uses scoped lock objects to manage mutexes. Boost had a scoped_lock for years, and now it's in C++17. Our "scoped" is a bit different in that it prevent usage anywhere but in stack variables, and is not movable. But since it's effectively hard-locked to a lexical scope, which must stay on one CPU thread, that does imply it uses the NVTX functions documented to be specific to a CPU thread. And since lexical scopes nest like a stack, it's easy to connect that to the Push/Pop C API names. The only other name I thought could make sense for these was "stack_range" since it's an even more direct implication, but the familiarity of "scoped_range" being like the tried and true "scoped_lock" gives that one a big advantage.

So I agree with Jake that "scoped_range" is sensible name for the thing you'd normally use, and if you get our polite compiler error message for trying to move it or use it in heap objects, "unique_range" is a sensible name for the thing you'd use instead.

jrhemstad commented 3 years ago

closed by https://github.com/NVIDIA/NVTX/pull/42