Open mrkajetanp opened 1 year ago
After discussing offline with @mrkajetanp , it seems that a solution might be to change the PID detection code:
pid
mandatory BackgroundCommand attribute would be the item in pids list when the list has only one item. Otherwise, this will trigger an AttributeError
with an appropriate message.In order to detect what is a child PID and what is a wrapper layer, we can try to manipulate the busybox sh -c
process name layer (e.g. by using an intermediate script that simply exec busybox sh -c "$@"'
). Since the last layer should always be a devlib-made sh -c
layer that should be enough to detect the internal wrapper layers.
Also it's important to note that the signal propagation is not limited to su
: shells can have complicated rules, e.g. the propagation of SIGINT with bash: https://www.gnu.org/software/bash/manual/html_node/Signals.html
When Bash is running without job control enabled and receives SIGINT while waiting for a foreground command, it waits until that foreground command terminates and then decides what to do about the SIGINT:
- If the command terminates due to the SIGINT, Bash concludes that the user meant to end the entire script, and acts on the SIGINT (e.g., by running a SIGINT trap or exiting itself);
- If the pipeline does not terminate due to SIGINT, the program handled the SIGINT itself and did not treat it as a fatal signal. In that case, Bash does not treat SIGINT as a fatal signal, either, instead assuming that the SIGINT was used as part of the program’s normal operation (e.g., emacs uses it to abort editing commands) or deliberately discarded. However, Bash will run any trap set on SIGINT, as it does with any other trapped signal it receives while it is waiting for the foreground command to complete, for compatibility.
So this problem needs to be solved regardless of what su
is used.
In order to make su
more reliable, we could make busybox setuid and then use busybox su, but that does not seem necessary for now if the PID detection is fixed.
Another avenue would be to insert an extra bit of shell script before the user-provided command:
function __devlib_get_pids() {
while true; do
# <insert here a blocking command we can remotely unblock>
cat /proc/$$/task/*/children > /path/to/a/file/we/know/about/with/unique/UUID
done
}
__devlib_get_pids &
<insert here the user-provided snippet>
Then devlib can unblock the loop every time it wants an up-to-date list of PIDs. @cloehle raised the issue of the user snippet waiting for all its children (which would wait forever if that includes the function we spawned). This can maybe be worked around by wrapping the user snippet in an sh -c
layer. We then know that the PIDs we are looking for are the immediate children of that specific layer.
After some thinking, I feel like the best solution is probably the most explicit one: the user-provided shell snippet should have access to a shell function that allows declaring what is the "PID of that snippet". This would then be collected by the Python part and signals would be delivered to that specific task. Code that wouldn't use that helper would be limited to sending SIGKILL to the command (as it de facto is today):
# The process spawned by devlib-run would have its PID made available to the Python API
target.background('if [ foo ]; then devlib-run mycommand; else foobar; fi')
This allows arbitrarily complex shell snippet to still be handled in the most sensible way by letting the user declare the "main event" in the background command. This is particularly important for signals like SIGUSR1, that the main command might handle in a special way, but may result in terminating any other helper commands that happens to be executed at that time in the snippet if the signal was delivered to all processes.
In terms of implementation, we can easily pre-pend something like that to the user-provided shell command:
function devlib-run {
# do something based on a UUID generated by Target.background() so that the Python code can retrieve the info somehow, e.g. with a temp file containing the PID name or something like that
}
@marcbonnici what do you think about this solution ?
Started working on it
On some platforms (e.g. Android with
su
coming from Magisk) there seems to be an issue where BackgroundCommand.send_signal() does not propagate through the outersh
layers to the inner process. The same implementation that works okay whether withadb root
or just withsu
coming from Android breaks on the Magisk setup. Manually finding the PID of the inner process and sending the signal to it on the Magisk setup works around the problem which strongly indicates it's a signal propagation issue.The process hierarchy ends up looking as follows:
Other platforms might potentially be affected as well.