epickrram / grav

Performance visualisation tools
Apache License 2.0
293 stars 22 forks source link

scheduling-profile seems not to be working on kernel 4.4 #9

Closed stefanodoni closed 6 years ago

stefanodoni commented 6 years ago

Hi,

First of all many thanks for sharing this work!

I'm interested in trying the scheduling-profile tool on Ubuntu 16.04, kernel 4.4. I have installed bcc as per iovisor instructions. However, it seems not to be working:

$ sudo ./scheduling-profile 2918 Recording scheduling information for 15 seconds /virtual/main.c:41:63: warning: incompatible pointer to integer conversion initializing 'char' with an expression of type 'void *' [-Wint-conversion] struct proc_counter_t new_counter = {.proc_name = NULL, .count = 0}; ^~~~ include/linux/stddef.h:7:14: note: expanded from macro 'NULL'

define NULL ((void *)0)

         ^~~~~~~~~~~

1 warning generated. No samples for pid 2918

Can the tool be made compatible with 4.4 kernels or it requires some new eBPF capability found in newer kernels?

Thank you!

epickrram commented 6 years ago

Can you see if there are any files generated in /tmp/? Specifically, you should see:

/tmp/jstack-2918.txt /tmp/scheduler-states-2918.json

The C warnings are not an indicator of whether the tracing is actually working. Also, can you try running without sudo, and just entering a password when prompted.

drandynisbet commented 6 years ago

Howdy, I'm on ubuntu16.04 4.13.0-32-generic, openjdk9, I get a similar warning, after starting java (pid 22949) running a DaCapo benchmark for 20 iterations and I get a similar issue as above - is there a need to use hotspot instead? I added some debugging to the grav/src/cpu/scheduler_profile.py to print stuff out ... ie sys.argv[1], tid_to_thread_name and thread_scheduling variables ....

./grav/bin/scheduling-profile 22949 /tmp/jstack-22949 has call stacks in it ... whereas /tmp/scheduler-states-22949 contains {"0": {"D": 60, "K": 0, "S": 1506889, "R": 1507495, "U": 0, "x": 71, "total": 3005466}, "2": {"D": 15, "K": 0, " S": 0, "R": 0, "U": 0, "x": 0, "total": 15}} /virtual/main.c:40:63: warning: incompatible pointer to integer conversion initializing 'char' with an expression of type 'void *' [-Wint-conversion] struct proc_counter_t new_counter = {.proc_name = NULL, .count = 0}; ^~~~ include/linux/stddef.h:7:14: note: expanded from macro 'NULL'

define NULL ((void *)0)

         ^~~~~~~~~~~

1 warning generated. ('sys.argv[1] {}', '/tmp/jstack-22949.txt') ('tid to thread name {}', "{'22950': 'main', '22957': 'GC Thread#6', '22970': 'G1 Marker#1', '22955': 'GC Thread#4', '22954': 'GC Thread#3', '22997': 'node-3', '22996': 'node-4', '22995': 'node-5', '22994': 'node-2', '22993': 'node-1', '22978': 'C2 CompilerThread2', '22975': 'Signal Dispatcher', '22974': 'Surrogate Locker Thread (Concurrent GC)', '22977': 'C2 CompilerThread1', '22976': 'C2 CompilerThread0', '22971': 'VM Thread', '22956': 'GC Thread#5', '22973': 'Finalizer', '22972': 'Reference Handler', '23058': 'Attach Listener', '22959': 'G1 Refine#7', '22958': 'GC Thread#7', '22979': 'C1 CompilerThread3', '22992': 'node-0', '22980': 'Sweeper thread', '22981': 'Common-Cleaner', '22982': 'Service Thread', '22983': 'VM Periodic Task Thread', '22968': 'G1 Main Marker', '22969': 'G1 Marker#0', '22966': 'G1 Refine#0', '22967': 'G1 Young RemSet Sampling', '22964': 'G1 Refine#2', '22965': 'G1 Refine#1', '22962': 'G1 Refine#4', '22963': 'G1 Refine#3', '22960': 'G1 Refine#6', '22961': 'G1 Refine#5', '22953': 'GC Thread#2', '22952': 'GC Thread#1', '22951': 'GC Thread#0'}") ('thread scheduling {}', "{u'0': {u'D': 60, u'K': 0, u'S': 1506889, u'R': 1507495, u'U': 0, u'x': 71, u'total': 3005466}, u'2': {u'D': 15, u'K': 0, u'S': 0, u'R': 0, u'U': 0, u'x': 0, u'total': 15}}") No samples for pid 22949

Cheers, Andy

epickrram commented 6 years ago

The data in /tmp/scheduler-states-22949 should be thread state counts keyed by thread_id. For some reason, the two samples in that file have thread_ids 0 and 2, so when the script tries to match that thread_id to a name in the file /tmp/jstack-22949.txt, it fails and determines that there are no samples for the threads in the jstack file.

Testing locally, I get the same results. Pid 0 is swapper, pid 2 is kthreadd, so it looks as though normal processes are not being captured. I'll see if I can figure out the problem.

epickrram commented 6 years ago

I've updated the bcc script to attach to a tracepoint rather than a kprobe. It seems to be doing the job, but I haven't had a close look yet to make sure that the results are accurate.

Use at your own risk ;)