Closed Sipondo closed 1 year ago
Thanks @Sipondo - this is looking mostly good to me!
Just curious, do you think we could go ahead and register all strings passed to annotate
/push
/start
transparently on the user's behalf? Of course, this will lead to an increase in memory usage but I wonder if in typical applications, that memory use is small or bounded enough to be OK. Doing this would allow us to keep the NVTX Python API smaller/simpler.
(Feel free to let me know if you think this is a bad idea!)
Thank you for your reply, @shwina .
Makes sense to me! As long as the documentation makes this explicit.
From what I've seen these Python bindings are not used in situations where performance is so critical that such an increase in overhead would be problematic - in that case, one should probably refer to a C implementation instead. The degree of string registration will likely matter little to none for the applications this API was designed for.
That said, string registration is required for NVTX filtering in Nsys and NCU. Automatically registering all strings will definitely remove some confusion and development time when people want to filter using NVTX.
Thanks! -- that's good to know that you have the same impression. So can we go ahead and make the following changes in this PR?
str
objects for the message=
argument in annotate/push/start_range, and create RegisteredString
objects behind the scenes.If (2) ever becomes an issue, we can perhaps look into a global config option that controls whether or not to cache message strings.
Those changes sound good.
How would you suggest to cache the RegisteredString
objects? Should we keep a private dictionary in nvtx.py
?
How would you suggest to cache the RegisteredString objects?
Because RegisteredString
has the metaclass CachedInstanceMeta
, that should already be taken care of for you:
In [7]: s1 = RegisteredString(domain, "hello")
In [8]: s2 = RegisteredString(domain, "hello")
In [9]: s3 = RegisteredString(domain, "goodbye")
In [10]: s1 is s2
Out[10]: True
In [11]: s1 is s3
Out[11]: False
In [12]: domain2 = nvtx._lib.Domain("domain2")
In [13]: s4 = RegisteredString(domain2, "hello")
In [14]: s1 is s4
Out[14]: False
Ah, right! I was familiar with the behaviour of the current cached strings hadn't checked source. Thanks for confirming.
@shwina I've committed the changes as we discussed. This commit also includes compatibility with the profiler side of things and I fixed a small unrelated issue with categories.
Just a couple of comments, after which this should be good to go! I'll test locally, merge, and cut a new release of NVTX with this feature!
Gentle ping @Sipondo -- do you think you can take this over the finish line? If you're pressed for time I can do so as well and merge it!
Thanks for the help! Looking forward to seeing the new library release.
Thanks! Just uploaded NVTX 0.2.7 to PyPI which should have this feature
Comment moved to its own issue: https://github.com/NVIDIA/NVTX/issues/78
Added support for string registration which is required for performance-critical tracking and NVTX annotation filtering in Nsight Systems/Compute.
Example: