Closed drewm1980 closed 5 years ago
Hi @drewm1980 I'm glad you consider using HawkTracer :) I've never written a code for tracing python, and I haven't heard of anybody who did that. I also didn't have plan to do this in the nearest future. However, since there's a potential use of that, I'm very happy to implement that functionality, or at least assist you with doing that. I'd need some more input first, so we can have a proper design that fits your needs, but also is generic enough so other people can benefit from that. Could you please tell me what exactly would you like to trace? Are you only interested in measuring time spent in a particular function, or you want to measure time of arbitrary code? Do you want to be able to define custom classes in python, or having this feature in C++ is enough for now for you? How do you integrate your C++ code with python? Are you just running python interpreter inside your C++ application? I'd like to know a bit more what's your usecase, so I can deliver a solution that works best for you :)
We're writing soft real-time software for robotic pick and place machines, with cycle times usually 100-200 ms. We'd like to see visually when operations are taking longer than we expect, threads are contending, cpu is idle while waiting for the GPU, etc... in the name of reducing worst case latency for analyzing one frame of data. Python is our glue language and calls into C++ code we wrote, and Tensorflow.
Some other profilers we're looking at: RAD games Telemetry (intrusive, native only) http://www.radgametools.com/telemetry.htm That looks like the best intrusive profiler out there, but it's priced for AAA games, so you need a solid business case to use it at all. I saw a video of that in one of Jonathan Blow's live coding videos, and it's what inspired me to improve our profiling tooling past CProfile + Snakeviz. Our ideal would be that, but without the need for manual instrumentation, and have python and native symbols/call stacks resolved correctly.
Uber's pyflame and Ben Frederickson's py-spy (sampling, mixed Python/native) are very promising, but they're very new (as in Ben added the native call stack unwinding parts only last week) and I had issues getting them working in our environment. https://github.com/uber/pyflame/issues/165 https://github.com/benfred/py-spy/issues/86
easy_profiler (intrusive, native only, with a gui) has been around longer, but I couldn't get the client and the code to connect: https://github.com/yse/easy_profiler
We may try perf + cpuprofilify (sampling, native): https://github.com/thlorenz/cpuprofilify
I evaluated VTune; it has some python integration, but it doesn't seem to have a fully functional timeline view; as a tool it seems more focused on reducing overall cpu time, rather than minimizing worst case latency.
There are also big libraries like LTTNG and SystemTap for system-wide profiling, and big libraries for HPC stuff, but it's not clear how to even get started with them.
Nvidia's and AMD's profilers are understandably focused on profiling their GPU's, and don't show much, if anything for the cpu.
Anyway, thanks for the response! We did get hawktracer working on a C++ function in our code; thanks for writing it!
Hi @drewm1980 thanks for the explanation. So it looks like you're basically interested in low-overhead intrusive function-level time profiling for more than just the native stack - that's what HawkTracer was designed for, as our usecase was very similar, but except python, we had lua/javascript.
I'm very happy to work with you getting python bindings to work in HawkTracer.
I see three possible solutions for that:
We expose Start(label)/Stop() functions to python script, so you can do something like:
def foo():
hawktracer.Start("some_label")
# your python code here
hawktracer.Stop()
This is just modification of the 1). We keep exposing Start()/Stop() methods, but in your code instead of calling those methods directly, you write them as a comment:
def foo():
# TracepointStart some_label
# your your python code here
# TracepoinStop
Then you have generator that scans the code, and turns it into the code shown in the 1). The advantage of this solution is that in the production code you can very easily completely disable HawkTracer, and just have a special build target where you enable those tracepoints. The disadvantage of this solution is complexity, as you need to have 2 build targets, script for scanning and fixing the code etc. Other option to reduce overhead (to almost zero) in the production code is to just keep the code as it was in 1), and change C++ implementation of Start()/Stop() method to empty functions if you don't want to run the profiler (it could be e.g. some kind of ifdef
Not sure if that's possible in python, but in LUA you can set function call/function exit hooks (http://pgl.yoyo.org/luai/i/lua_sethook) so you can easily trace all the functions called in python. As I said, I'm not sure if that's possible in python. We could have some kind of filtering, in case there's too many tracepoints that we're not interested in.
regarding 1) and 2) - if you want to measure the scope, we could have a class with the deleter, and measure the scope using with
statement:
For 1), it'd be:
def foo():
with HawkTracerTracepoint("label") as tp:
# your python code
For 2):
def foo():
# TraceScope label
# your python code
and then have a script that turns the code into 1).
We could also mix 1)/2) with 3).
Please let me know what are your thoughts on that, and if you have any other ideas :)
I'm not sure what the most ergonomic or efficient API for python is... function decorators are likely the pythonic way to do it. The interpreter already has some mechanism for instrumenting at the function level, though, through cProfile. If there's a way of setting up hawktrace hooks for specific functions at module import time, I bet it would be more efficient than having python function decorator code call the C++ functions through an FFI at function call time... But keep in mind I know very little about how the python interpreter works internally.
cProfile has a reputation for high overhead, though we haven't measured it on our project yet (one of our next steps). Maybe restricting it only to instrument functions you care about (and log them through hawktrace so the log is shared with native code) would be fast enough for most uses.
We already spent probably too much dev time on tooling; not sure how much more time we can spend in the short term. We'll likely focus on improving coverage of the C++ parts of our code for now.
Then how about we implement a decorator, something like:
import os
import HawkTracer
def HawkTracerTrace(func):
if HawkTracerTrace.tracing_enabled is None:
HawkTracerTrace.tracing_enabled = 'HAWKTRACER_TRACE' in os.environ
print(HawkTracerTrace.tracing_enabled)
def wrapper(*args, **kwargs):
if HawkTracerTrace.tracing_enabled is None:
HawkTracerTrace.tracing_enabled = 'HAWKTRACER_TRACE' in os.environ
if HawkTracerTrace.tracing_enabled:
HawkTracer.start(func.__name__)
func(*args, **kwargs)
HawkTracer.stop()
else:
func(*args, **kwargs)
return wrapper
HawkTracerTrace.tracing_enabled = None
Then whenever you want to instrument your function, you can do so:
@HawkTracerTrace
def func1():
print("Func1()")
@HawkTracerTrace
def func2():
print("Func2()")
? You can control whether the attribute is enabled or disabled through env variable. Also, you can decide which function you want to trace, as you need to manually instrument your code.
Hi, I think it's better to do the os.environ check outside of the wrapper, and if false, just return the function unmodified (I believe the decorator is evaluated when the interpreter hits the "@"). Otherwise you're adding control flow and environment variable checks even when instrumentation is supposed to be turned off. How about:
in hawktracer.py:
#!usr/bin/env python3
import os
#_tracing_enabled = False
_tracing_enabled = True
if 'HAWKTRACER_TRACE' in os.environ:
_tracing_enabled = True
def enable():
_tracing_enabled = True
def disable():
_tracing_enabled = False
def _start(region_name):
# Actual code that starts a hawktracer region
pass
def _stop():
# Actual code that stops a hawktracer region
pass
def start(region_name):
if _tracing_enabled:
print("Started a trace region called",region_name,'...')
_start(region_name)
else:
print("NOT Starting a trace region called",region_name,'...')
def stop():
if _tracing_enabled:
print("Stopped a trace region")
_stop()
def trace(func,_tracing_enabled=_tracing_enabled):
if _tracing_enabled:
print("Instrumenting function",func.__name__)
def wrapper(*args, **kwargs):
_start(func.__name__)
func(*args, **kwargs)
_stop()
return wrapper
else:
print("NOT instrumenting function",func.__name__)
return func
in user code:
#!/usr/bin/env python3
import hawktracer
from hawktracer import trace
hawktracer.enable()
from time import sleep
@trace
def foo():
pass
from hawktracer import start, stop
start("bar")
stop()
hawktracer.disable()
@trace
def baz():
pass
The above code has a bug I haven't worked out yet... enable() and disable() don't have an effect. Maybe you already tried something like that and it's why you did what you did. There's probably another python intrusive profiler out there that has worked this issue out...
Hi @drewm1980 Thanks a lot for suggestions. I'm aware the code I wrote is not ideal, but the main thing I wanted to point out is the end-user experience. So I assume we both agree that the api should be:
@trace
def foo():
# this function is going to be traced when called
pass
hawktracer.enable() # tracing enabled
foo() # this will generate the tracepoint
hawktracer.disable() # tracing disabled
foo() # this function call won't generate tracepoints, because hawktracer is disabled
Not sure about the function names/namespaces yet, but more or less it'll be something like above. If you agree with that design, I'll prototype it over the weekend. If the implementation is not efficient, we can iterate and improve it, but at least the functionality is going to be there and the API will be available for customers (including yourself:) ). Please let me know what do you think about that, so we can start working on that soon.
I'm likewise unsure about what people end up really needing in practice, i.e. if they're more concerned about being able to enable/disable tracing:
Each project probably will have an opinion on how to do 0 depending on their build environment. I was trying to make 1. and 2. work. I think your last example is trying to make 3. work.
Do you have internal python users at Amazon to drive your development priorities? Or do you have aspirations for hawktracer to grow into some sort of tool for AWS/cloud? The ordering above would probably be mine, but you're not working for us :)
Being able to promise "no overhead in instrumented functions when disabled" is an important promise to be able to make to put people's minds at ease when sprinkling instrumentation code all over their codebase. I think the way I wrote the decorator probably achieved that... in C++ you probably want to have a preprocessor flag that ensures hawktrace isn't in the compiled code when disabled.
It's the #2 FAQ on Telemetry's web page: http://www.radgametools.com/telemetry/faq.html
hi @drewm1980 Right now there's no usecase at Amazon. However, we have very similar situation for JavaScript and LUA. I don't see much difference between my proposal and yours - they both require instrumenting the function you want to trace, and they both do it through annotations, i.e.:
@trace
def foo():
pass
The way it's going to be implemented is not that important for me right now - it's easy to change the implementation, but difficult to change the API. I can guarantee that we do our best to introduce as little overhead (ideally, zero overhead) as possible when user decides to disable the HawkTracer system. But as I said, it's less important than deciding the API itself. Having said that, looks like the function annotations is the API we both agree, so if there's no objection, I'll do a prototype over the weekend. :)
I of course have no objection!
I have a point to make RE the implementation affecting the API, but I found it easier to write the code than to explain it clearly. I'll fork and create a pull request.
Here's the pull request:
https://github.com/amzn/hawktracer/pull/47
No offense if you don't want to pull for whatever reason; cutting and pasting here was just getting clunky. I got 2. working, but it required changing how the API is used (see code). 3. is broken.
Hi @drewm1980 your proposal looks good. I have a few improvements though that would allow us to do 3 as well. I changed the _trace_
function:
def _trace(func):
global _tracing_enabled
print("Instrumenting function",func.__name__)
def wrapper(*args, **kwargs):
if _tracing_enabled:
_start(func.__name__)
func(*args, **kwargs)
if _tracing_enabled:
_stop()
return wrapper
so it checks in the wrapper if tracing is enabled - it's only an overhead (one if
statement) if function was instrumented (i.e. decorators were enabled), and it's just one condition, which is not significant compared to other hawktracer stuff.
I also changed enable_decorator()
method:
def enable_decorator():
global _tracing_enabled
global trace
if _tracing_enabled:
trace = _trace
so it only enables the decorator when the tracing is enabled. After that, the example from your pull request works as expected.
I also changed default value of _tracing_enabled
(I set it to False
by default) and default value of trace
variable (I set it to _identity
).
What do you think about those changes?
Hi @drewm1980 I've done a prototype over the weekend on my personal github account. Please have a look and let me know what do you think about it. Here is example usage: https://github.com/loganek/hawktracer/blob/python-lib-bindings/bindings/python3/instrumentation-example.py And here's the implementation: https://github.com/loganek/hawktracer/blob/python-lib-bindings/bindings/python3/hawktracer_core_python.cpp
I'm very open for any suggestion, feel free to make pull request to that branch as well. Once we think it's ready, I'll merge it to this repository's master branch.
Hi @loganek Very cool! So it looks like for the decorator disabling code you're doing what my python code was doing, but ported to C:
static PyObject *
ht_python_core_trace(PyObject* Py_UNUSED(self), PyObject* args)
{
PyObject *traced_function;
if (!PyArg_ParseTuple(args, "O", &traced_function))
{
return NULL;
}
if (trace_method)
{
return PyCFunction_New(trace_method, traced_function);
}
else
{
Py_XINCREF(traced_function);
return traced_function;
}
}
Does having the state for whether the decorator is enabled in C mean that users can do import HawkTracer.Core.trace as trace
and then just do @trace
and have decorator disabling still work?
static PyObject*
ht_python_core_trace_function(PyObject* function, PyObject *args)
{
uintptr_t address = 0;
if (!PyCallable_Check(function))
{
Py_RETURN_NONE;
}
if (tracing_enabled)
{
PyObject* function_name_attr = PyObject_GetAttrString(function, "__name__");
if (function_name_attr)
{
address = ht_python_core_get_label_address(function_name_attr);
ht_feature_callstack_start_int(ht_global_timeline_get(), address);
Py_DECREF(function_name_attr);
}
}
PyEval_CallObject(function, args);
if (address)
{
ht_feature_callstack_stop(ht_global_timeline_get());
}
Py_RETURN_NONE;
}
I'm studying how you're caching the function names... I'm wondering if there will be issues with functions that have the same name... in particular, in python multiple instances of the same string can end up sharing memory as an optimization, and this even holds for function names, in or out of a class:
In [7]: x = 'foo'
In [8]: y = 'foo'
In [9]: id(x)
Out[9]: 140149505679912
In [10]: id(y)
Out[10]: 140149505679912
In [13]: def f():
...: pass
...:
In [14]: g = f
In [15]: def f():
...: pass
...:
In [16]: g.__name__
Out[16]: 'f'
In [17]: f.__name__
Out[17]: 'f'
In [19]: id(g.__name__)
Out[19]: 140149571706928
In [20]: id(f.__name__)
Out[20]: 140149571706928
In [26]: class Foo():
...: def __init__(self):
...: pass
...: def f(self):
...: pass
In [28]: id(Foo.f.__name__)
Out[28]: 140149571706928
That seems contrived, but in my codebase, there are multiple functions with the same name as members of various classes in various modules. If I understand correctly, your tracepoint_map is using function names as UUID's, and that's not going to work in a lot of python code bases; python's not like C where function names have to be unique or they collide, or like in C++ where they're transparently mangled so they don't collide when there is overloading.
Could tracepoint_map be constructed in the decorator function, so that the overhead of setting it up is at decoration-time, rather than the first run through the code?
Dunno if it's significant or not; just wanted to send some quick feedback after skimming the code. My team member who tried out hawktracer on C++ in our code will give your new python wrapper a shot.
Hi @drewm1980
Hi @loganek Very cool! So it looks like for the decorator disabling code you're doing what my python code was doing, but ported to C
Yes, it's exactly like that, with a few little improvements.
Does having the state for whether the decorator is enabled in C mean that users can do
import HawkTracer.Core.trace as trace
and then just do@trace
and have decorator disabling still work?
Yes, I've tested various scenarios, but feel free to double-check.
I'm studying how you're caching the function names... I'm wondering if there will be issues with functions that have the same name... in particular, in python multiple instances of the same string can end up sharing memory as an optimization, and this even holds for function names, in or out of a class: That seems contrived, but in my codebase, there are multiple functions with the same name as members of various classes in various modules. If I understand correctly, your tracepoint_map is using function names as UUID's, and that's not going to work in a lot of python code bases; python's not like C where function names have to be unique or they collide, or like in C++ where they're transparently mangled so they don't collide when there is overloading.
In general I don't see any problem with using the same map entry for 2 different functions - as long as the name is correct, it's fine. But as far as I understand, you suggest to not use __name__
property as it might be confusing for the user? If so, what would you suggest instead? Ideally, I'd like to have the full name (including module, e.g. module.submodule.function
but not sure how to get it in python (although I'm sure there's a way to do that).
Also, please note I use python dict object instead of c++ unordered_map
. The reason why is that initially I was doing it in C, therefore didn't have access to STL. I might change it in the future, but haven't decided yet.
Could tracepoint_map be constructed in the decorator function, so that the overhead of setting it up is at decoration-time, rather than the first run through the code?
Yeah, I think that'd be a nice improvement, which could be easily implemented (I hope). As I said in one of my previous comments, I'm more focusing on the interface and user experience first, so I didn't pay much attention to actual implementation. There might still be quite a few potential improvements in my code :)
Dunno if it's significant or not; just wanted to send some quick feedback after skimming the code. My team member who tried out hawktracer on C++ in our code will give your new python wrapper a shot.
Any feedback is very appreciated, especially as constructive as yours. Thanks a lot, and looking forward too see more of your comments :)
Does having the state for whether the decorator is enabled in C mean that users can do
import HawkTracer.Core.trace as trace
and then just do@trace
and have decorator disabling still work?Yes, I've tested various scenarios, but feel free to double-check.
cool! When I showed your example code to another developer, his initial impression was "I have to do all this stuff?". Of course it's not, but it would probably be good for adoption to have a minimal_example.py file that just traces one function without showing off any other optional features.
In general I don't see any problem with using the same map entry for 2 different functions - as long as the name is correct, it's fine. But as far as I understand, you suggest to not use
__name__
property as it might be confusing for the user? If so, what would you suggest instead? Ideally, I'd like to have the full name (including module, e.g.module.submodule.function
but not sure how to get it in python (although I'm sure there's a way to do that). Also, please note I use python dict object instead of c++unordered_map
. The reason why is that initially I was doing it in C, therefore didn't have access to STL. I might change it in the future, but haven't decided yet.
You can disregard that comment for now; we'll let you know if it's an issue in practice. If the user annotates their own nested regions, that turns into something like a call stack in the output anyway, so maybe the name shadowing isn't an issue as long as hawktracer still gets the nesting right when regions have the same name.
Any feedback is very appreciated, especially as constructive as yours. Thanks a lot, and looking forward too see more of your comments :)
No need to thank me; you're doing all the hard work!
would probably be good for adoption to have a minimal_example.py file that just traces one function without showing off any other optional features.
Yeah, the minimalistic example would be:
import HawkTracer.Core
@HawkTracer.Core.trace
def foo():
print("This is foo.")
foo()
and run the program with HT_PYTHON_TRACE_ENABLED
set. I'll create another, simplified example.
You can disregard that comment for now; we'll let you know if it's an issue in practice. If the user annotates their own nested regions, that turns into something like a call stack in the output anyway, so maybe the name shadowing isn't an issue as long as hawktracer still gets the nesting right when regions have the same name.
Sure, let me know if you have any real issues, so we can work to fix it.
I think I'll merge my changes soon. I want to re-review it and fix potential problems. After that, I'll close the issue. Feel free to open another one, if you find some bugs or you have any future feature requests :)
This is bikeshedding, but IMHO it's prettier as:
from HawkTracer.Core import trace @trace def foo(): print("This is foo.") foo()
Doesn't it still need an init function call? Or is that already handled in the import? If so, nice!
Also bikeshedding, but python has a convention for package and module name capitalization:
From PEP 8: Package and Module Names Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.
It's not uncommon for users to just start typing stuff in an ipython shell to discover an API. Such users will try the lowercase module name first, curse, maybe check your docs, then discover the capitalized version.
This is bikeshedding, but IMHO it's prettier as:
Sure, agree.
Doesn't it still need an init function call? Or is that already handled in the import? If so, nice!
It's not upstream yet, but I have that change locally.
It's not uncommon for users to just start typing stuff in an ipython shell to discover an API. Such users will try the lowercase module name first, curse, maybe check your docs, then discover the capitalized version.
Good to know, will rename it to hawk_tracer.core
then
Hey @drewm1980 I finished the code and opened a PR https://github.com/amzn/hawktracer/pull/48 . Please review, any suggestion is very appreciated :)
PR #48 has been merged, closing the issue. Feel free to open a new one in case there's some missing functionality.
Has anyone written an integration for tracing python code? If so, do you have a reference?
Greetings from Belgium, by the way; I'm the American guy who asked you a couple questions at FOSDEM.
Cheers, Andrew