ARM-software / devlib

Library for interaction with and instrumentation of remote devices.
Apache License 2.0
47 stars 78 forks source link

BackgroundCommand signals don't propagate through some shell setups #645

Open mrkajetanp opened 1 year ago

mrkajetanp commented 1 year ago

On some platforms (e.g. Android with su coming from Magisk) there seems to be an issue where BackgroundCommand.send_signal() does not propagate through the outer sh layers to the inner process. The same implementation that works okay whether with adb root or just with su coming from Android breaks on the Magisk setup. Manually finding the PID of the inner process and sending the signal to it on the Magisk setup works around the problem which strongly indicates it's a signal propagation issue.

The process hierarchy ends up looking as follows:

oriole:/ $ ps -e | grep 17488
root         17488 17483 10830432  3176 0                   0 S sh        <-- parent 1
root         17491 17488     876    384 0                   0 S trace-cmd <--  the PID we need to SIGINT
oriole:/ $ ps -e | grep 17483
root         17483 17479 10826332  3176 0                   0 S sh        <-- parent 2
root         17488 17483 10830432  3176 0                   0 S sh
oriole:/ $ ps -e | grep 17479
shell        17479 17477 10852960  3172 sigsuspend          0 S sh        <-- parent 3
root         17483 17479 10826332  3176 0                   0 S sh
oriole:/ $ ps -e | grep 17477
shell        17477   882 10791520  3172 sigsuspend          0 S sh        <-- parent 4, the one held in bg_cmd.pid
shell        17479 17477 10852960  3172 sigsuspend          0 S sh

Other platforms might potentially be affected as well.

douglas-raillard-arm commented 1 year ago

After discussing offline with @mrkajetanp , it seems that a solution might be to change the PID detection code:

In order to detect what is a child PID and what is a wrapper layer, we can try to manipulate the busybox sh -c process name layer (e.g. by using an intermediate script that simply exec busybox sh -c "$@"'). Since the last layer should always be a devlib-made sh -c layer that should be enough to detect the internal wrapper layers.

douglas-raillard-arm commented 1 year ago

Also it's important to note that the signal propagation is not limited to su: shells can have complicated rules, e.g. the propagation of SIGINT with bash: https://www.gnu.org/software/bash/manual/html_node/Signals.html

When Bash is running without job control enabled and receives SIGINT while waiting for a foreground command, it waits until that foreground command terminates and then decides what to do about the SIGINT:

  • If the command terminates due to the SIGINT, Bash concludes that the user meant to end the entire script, and acts on the SIGINT (e.g., by running a SIGINT trap or exiting itself);
  • If the pipeline does not terminate due to SIGINT, the program handled the SIGINT itself and did not treat it as a fatal signal. In that case, Bash does not treat SIGINT as a fatal signal, either, instead assuming that the SIGINT was used as part of the program’s normal operation (e.g., emacs uses it to abort editing commands) or deliberately discarded. However, Bash will run any trap set on SIGINT, as it does with any other trapped signal it receives while it is waiting for the foreground command to complete, for compatibility.

So this problem needs to be solved regardless of what su is used.

In order to make su more reliable, we could make busybox setuid and then use busybox su, but that does not seem necessary for now if the PID detection is fixed.

douglas-raillard-arm commented 6 months ago

Another avenue would be to insert an extra bit of shell script before the user-provided command:

function __devlib_get_pids() {
   while true; do
       # <insert here a blocking command we can remotely unblock>
       cat /proc/$$/task/*/children > /path/to/a/file/we/know/about/with/unique/UUID
   done
}
__devlib_get_pids &
<insert here the user-provided snippet>

Then devlib can unblock the loop every time it wants an up-to-date list of PIDs. @cloehle raised the issue of the user snippet waiting for all its children (which would wait forever if that includes the function we spawned). This can maybe be worked around by wrapping the user snippet in an sh -c layer. We then know that the PIDs we are looking for are the immediate children of that specific layer.

douglas-raillard-arm commented 5 months ago

After some thinking, I feel like the best solution is probably the most explicit one: the user-provided shell snippet should have access to a shell function that allows declaring what is the "PID of that snippet". This would then be collected by the Python part and signals would be delivered to that specific task. Code that wouldn't use that helper would be limited to sending SIGKILL to the command (as it de facto is today):


# The process spawned by devlib-run would have its PID made available to the Python API
target.background('if [ foo ]; then devlib-run mycommand; else foobar; fi')

This allows arbitrarily complex shell snippet to still be handled in the most sensible way by letting the user declare the "main event" in the background command. This is particularly important for signals like SIGUSR1, that the main command might handle in a special way, but may result in terminating any other helper commands that happens to be executed at that time in the snippet if the signal was delivered to all processes.

In terms of implementation, we can easily pre-pend something like that to the user-provided shell command:

function devlib-run {
   # do something based on a UUID generated by Target.background() so that the Python code can retrieve the info somehow, e.g. with a temp file containing the PID name or something like that
}

@marcbonnici what do you think about this solution ?

douglas-raillard-arm commented 3 months ago

Started working on it