ARM-software / devlib

Library for interaction with and instrumentation of remote devices.
Apache License 2.0
47 stars 78 forks source link

devlib.collector: Add PerfettoCollector #618

Closed mrkajetanp closed 1 year ago

mrkajetanp commented 1 year ago

Add a Collector for accessing Google's Perfetto tracing infrastructure. The Collector takes a path to an on-device config file, starts tracing in the background using the perfetto binary and then stops by killing the tracing process.

Signed-off-by: Kajetan Puchalski kajetan.puchalski@arm.com

mrkajetanp commented 1 year ago

I updated the PR with Douglas' suggestions. I also had to bring back the manual pull timeout bit. I noticed that without it I'd end up getting a pull timeout exception when trying to pull the trace from the device. No idea why but having the manual timeout fixes the problem.

douglas-raillard-arm commented 1 year ago

I noticed that without it I'd end up getting a pull timeout exception when trying to pull the trace from the device.

That's strange, from a cursory look I could not find any non-None default timeout on the pull() path, except in the transfer manager but it's set to 3600

mrkajetanp commented 1 year ago

Now that I'm thinking about it, are we sure switching to the self.target.background method is the better way? Perfetto's 'intended' way to trace in the background is to pass the --background argument which just returns the PID and then later to kill that PID whenever you're done. If we're using self.target.background we need to use Perfetto's non-background variant designed to be run in a terminal and cancelled with Ctrl-C. It seems to work either way so I'm not going to make a stand here but I'm just wondering if we might not end up interfering with how the tool is meant to operate on account of making the code more 'pythonic' or whatever.

douglas-raillard-arm commented 1 year ago

I would definitely use background command. The previous version of the code was straightfoward and yet already had issues:

  1. if the process dies, we would still attempt to kill an old PID even though it might point at an entirely different task. Android can exhaust PIDs pretty quickly.
  2. it allowed starting perfetto twice, which would result in failures down the line
  3. less likely, but if perfetto hanged at SIGINT/SIGTERM, devlib would freeze. With BackgroundCommand.cancel(), SIGTERM is sent and then SIGKILL after a timeout.
  4. if devlib disconnects, perfetto will be left running. Background commands are terminated properly when the target is destroyed or disconnected.

The change to the background command API fixed 1. and made 2. obvious. On top of that, we can now capture stdout/stderr if required for debug, instead of having to go hunt for some log file somewhere. Process control is a tricky business and it's definitely the sort of stuff we don't want to duplicate. Ctrl-C in a terminal is simply SIGINT, which will likely be handled like SIGTERM so I expect perfetto to work fine with that.

mrkajetanp commented 1 year ago

I dropped the explicit timeout argument but that requires #619 so it should be either merged after the other one or at the same time

mrkajetanp commented 1 year ago

Since the transfer manager PR has been merged we should now be able to go ahead and merge this as well?

mrkajetanp commented 1 year ago

First of all, these are not issues for Android - on Android Perfetto always comes included and is always running so it'll just be fine. For Linux targets there's a bit more steps, true.

I'm wondering is you know an easy way to detect whether this service is running

I'm not sure if there's any official or proper way to check, we could always grep the ps output or something like that if we really wanted to?

or even adding some config to enable running the service as a background command

We could just make the tool use tracebox if it can't find the running service on Linux?

Secondly do you know if we would be able to include a compiled perfetto binary (or tracebox) similar to what we do for trace-cmd etc to make it easier for users to get started?

As mentioned above, it comes bundled with Android so on Android there's no point. On Linux if the tracebox binary already works for you then I see no issues with bundling it.

https://perfetto.dev/docs/quickstart/linux-tracing

mrkajetanp commented 1 year ago

Sorry for the delay, here are the updates. I added a check to switch Perfetto on for Android 9 & 10, as well as a check to install tracebox on Linux targets if the service is missing. The tracebox binary was bundled along with the copied Perfetto license.

I also added a "is_running" function in Target, as far as I can tell there was no already existing equivalent and it seems like a useful thing to have so I made it a helper inside Target instead of implementing it just inside the collector.

mrkajetanp commented 1 year ago

Updated with using the fstring and a more proper implementation of is_running that takes a regexp so that we can match the name more accurately and avoid a false positive. An added upside is that awk will just make execute return an empty string instead of an exception in the absent case so we can avoid the exception handler.

mrkajetanp commented 1 year ago

I updated with just directly matching the comm to not overcomplicate things. Busybox ps solves the potential spaces problem already as far as I can tell. busybox ps with no arguments:

PID   USER     TIME  COMMAND
    1 0         1h22 /system/bin/init second_stage
    2 0         0:03 [kthreadd]
    3 0         0:00 [rcu_gp]

busybox ps -o comm,stat

COMMAND          STAT
init             S
kthreadd         SW
rcu_gp           IW<

None of the tasks that have spaces in the default output have them in the -o variant, it abbreviates the task names to a fixed width column and gets rid of whitespace.

douglas-raillard-arm commented 1 year ago

None of the tasks that have spaces in the default output have them in the -o variant, it abbreviates the task names to a fixed width column and gets rid of whitespace.

That's good to know, have you tried the args instead of comm ? That would contain the full command line though, which is a bit different.

mrkajetanp commented 1 year ago

have you tried the args instead of comm

comm just gives us exactly what we want, args appears slightly broken in the busybox implementation. Namely args,stat only shows the args part while stat,args shows both as expected. Still not really relevant here, the use case for this function is to check whether a background deamon is present or not and just matching comm does that more than adequately imo.

douglas-raillard-arm commented 1 year ago

comm just gives us exactly what we want

This is not what we want:

None of the tasks that have spaces in the default output have them in the -o variant, it abbreviates the task names to a fixed width column and gets rid of whitespace.

Fortunately I found that it's actually not true, the busybox binary shipped in devlib does report comm with spaces properly, e.g.

/devlib/devlib/bin/x86_64/busybox ps -A -o stat,comm | grep foo
S    foo bar.sh

for a script called "foo bar.sh"

I think the confusion comes from the full command line (args in ps parlance) and the task name (comm). comm is not a simplification of args, it's 2 separate strings:

mrkajetanp commented 1 year ago

I updated the PR so that for Android targets it'll write to Android's default perfetto trace directory instead of the devlib one since Perfetto seems to be having permission issues on non-rooted phones. This makes the collector work without root access on Android phones, tested on Android 13 + Pixel 6. I also renamed the output file to 'devlib-trace.perfetto-trace' just to avoid potential clashes with perfetto traces collected in other ways.