Closed Ikariusrb closed 8 years ago
Thanks!
If you're interested in working on a fix that's always appreciated. Otherwise would you be willing to test out a fix if I stick that on a branch?
I'm not sure enough of what things are supposed to look like to work on a fix myself, but I'm definitely willing to test out a fix from a branch.
Cool, thanks again. I'll update this when there's something available.
Would graphite_dump be fixed simply by teaching get_bounds to ignore results with a "-" separator, or is there more to the fix than that? If that's all there is to a fix, I can probably implement it myself and submit a pull request- otherwise I don't know what other things may need to happen.
I took a quick look and it's actually more complex then that. It used to be the case the profiler emitted a deeply nested tree of metrics, with periods instead of dashes in keys like org-apache-commons-daemon-support-DaemonLoader-load
. Now that's now not the case and graphite_dump.py is making a bad assumption about how to get the dummy metrics that represent the bounds. I'll think a bit more on how best to fix this.
The CPU trace tree still looks deeply nested in my graphite, from the small amount of stats I've collected. Unfortunately, I'm not really certain what it should look like, to spot anything wrong. I'm installing influx to try and get some profiles running and flame graphs in the short term, but I've no plans to tear down the graphite install, so I should be easily able to drop back to that to test your changes.
Okay, sounds good. In any case we've found InfluxDB to work a lot better for visualizing this data than Graphite.
I've pushed a workaround to the graphite_dump_fix branch.
Didn't like https://github.com/etsy/statsd-jvm-profiler/blob/graphite_dump_fix/visualization/graphite_dump.py#L33 - dumps out w/
return (min(bounds), max(bounds))
ValueError: min() arg is an empty sequence
I replaced Line 33 w the non-idiomatic but functional:
for bound in json_results['results']:
x = bound.replace(prefix + '.', '')
if x.isdigit():
bounds.append(int(x))
After that, it got substantially further, but then dumped out @
get_max_metric
return max([point[0] for point in json_results[0]['datapoints']])
IndexError: list index out of range
I'm just starting to troubleshoot that.
Ah yep, your change at line 33 is a better check. I should have been stripping off the prefix before checking isdigit
.
Regarding the next error, my first guess would be that there aren't any datapoints for some metric in the time range specified.
Well, I found a start/end range that eliminated the issues in get_max_metric. I suspect it should probably get a check to see if it successfully fetched any values, and return an indicator which is then checked for before attempting to extend the leaves.
After that, I ended up with None for some of the values in the loop in format_output. I'm not sure why some keys have "None" for the values, but it should be possible to check for None as the value and skip that result.
Thoughts on these suggested adjustments?
If those changes are working out for you, that sounds good to me.
Running curl to see what's been dumped into graphite shows the following: curl "http://127.0.0.1/metrics/expand?query=stats.gauges.statsd-jvm-profiler.cpu.trace.*&leavesOnly=1"
{"results": ["stats.gauges.statsd-jvm-profiler.cpu.trace.1", "stats.gauges.statsd-jvm-profiler.cpu.trace.72", "stats.gauges.statsd-jvm-profiler.cpu.trace.73", "stats.gauges.statsd-jvm-profiler.cpu.trace.74", "stats.gauges.statsd-jvm-profiler.cpu.trace.75", "stats.gauges.statsd-jvm-profiler.cpu.trace.77", "stats.gauges.statsd-jvm-profiler.cpu.trace.org-apache-commons-daemon-support-DaemonLoader-load-200"]}
Running graphite_dump.py against the same server bails out like this: python2.7 ./graphite_dump.py -o 127.0.0.1 -s 12:40_20160316 -e 12:41_20160316 -p stats.gauges.statsd-jvm-profiler.cpu.trace
Traceback (most recent call last): File "./graphite_dump.py", line 83, in
results = get_tree(host, args.prefix, args.start, args.end)
File "./graphite_dump.py", line 49, in get_tree
(min, max) = get_bounds(host, prefix)
File "./graphite_dump.py", line 34, in get_bounds
bounds = [int(bound.replace(prefix + '.', '')) for bound in json_results['results']]
ValueError: invalid literal for int() with base 10: 'org-apache-commons-daemon-support-DaemonLoader-load-200'
Looking at the code, it's expecting the stats to be prefix + '.' and the rest a number, where the value it's choking on has a "-" for the separator.