Closed douglas-raillard-arm closed 2 years ago
Preliminary results on:
with target.cpufreq.use_governor('performance'):
pass
1.4x faster on a SSH target (juno r0):
2.2x faster on a local target:
I checked all instances of threading.Lock in devlib and it should be safe, as none is in the path used when running a background command or any other coroutine.
If a function taking a threading.Lock was being called by 2 coroutines that end up running concurrently, the 2nd coroutine would wait for the lock, thereby blocking the main thread and deadlocking.
I also tried various combinations of thread pools, including running a blocking execute() in a different thread (so with separate SSH connections) and I did not get any real timing variations.
What I did get is some errors probably because I was maintaining too many connections on the SSH server, so the current approach with one connection and many concurrent SSH channels seems to be the best. I could not saturate the server with open channels (maybe there is no limit). If that was to happen, we could use an asyncio.Semaphore to limit the number of bg commands in flight.
There are a few things to cleanup (docstring etc) so if you are happy with the API as it stands I will proceed with:
EDIT: cleanup done. Beyond possible high level doc and converting more bits, the change is somewhat ready.
While working on a bug fix for another problem, it came to my attention that SSH servers accept a fixed number of "sessions" per connection. This "sessions" seem to map to paramiko's channels. This means that for a given connection, we might be limited to e.g. 10 concurrent channels (default of OpenSSH). This is configurable with MaxSessions
on OpenSSH side and results in an exception on paramiko side: Could not open an SSH channel: ChannelException(2, 'Connect failed')
I'll probably need to handle that in that PR to limit the number of opened channels. There does not seem to be a standard way of getting the max number of channels, but we can probably handle the exception as a hint, or attempt to read OpenSSH config and parse it to find the MaxSessions
value (a bit hacky though).
@setrofim @marcbonnici I've updated the PR with an automatic detection of number of allowed background commands, along with some automatic limiting in the async API to use at most half of the available channels.
Next step:
Updated the PR with checks to make sure that no file access conflicts when executing concurrent asyncio tasks. An artificial example:
from devlib.utils.asyn import *
m = ConcurrentResourceManager()
r1 = FileResource('host', 'file1', 'w')
r2 = FileResource('host', 'file1', 'r')
async def f1():
# m.track_resource(r1)
m.track_resource(r2)
print('from f1')
async def f2():
m.track_resource(r1)
m.track_resource(r2)
print('from f2')
async def f3():
# m.track_resource(r1)
print('from f3')
run(
m.concurrently(
(
f1(),
m.concurrently(
(
f2(),
f3(),
)
)
)
)
)
In the PR, ConcurrentResourceManager
is target.resource_manager
, which is thread-local
Note: this depends on Python >= 3.7 for asyncio.current_task()
PR updated with:
ConcurrentResourceManager
to AsyncManager
Target
have been converted. Not all of them have been "smartly" converted though, e.g. by doing things concurrently. @marcbonnici @setrofim Updated with a bunch of fixes that were needed to run on my test platform:
Add Target.execute(..., decode=True)
parameter. When decode=False
is used, execute() returns a tuple of bytes (stdout, stderr). This was necessary to successfully read sysfs content, which can contain binary data (from device tree). Note that the change to the Android connection class is entirely untested so a simple Target.execute('echo hello; echo world >&2 ', decode=False)
on an android device would be useful.
Cleanup the cgroup module. Trying to parallelize the setup shed some light on a few problems with the code structure. This PR migrates it to use cgroup v2, cleans some cruft and exposes more clearly what pieces will be retainable in a future cgroup API that allows full cgroup power and what will probably become deprecated.
I guess I could split the PR so we can merge the main bit, and then open a new one with the relatively unrelated fixes (cgroup v2 and read_tree_values() on binary content)
@marcbonnici PR updated with:
fixes to cgroups:
Controller.tasks('/')
. Since existing code apparently expects to list all the tasks in the root group (rather than the root of the subtree handled by the controller), handle it as a special case.Fix to Target.install():
Add a has_busybox=True
parameter to Connection.execute(), so that install() is able to install busybox without needing to run it (for Android). This is the WIP commit which will be squashed if it works well.
@marcbonnici @setrofim Updated the PR:
Target.execute_raw()
Target.execute_raw()
on top of BackgroundCommand.communicate() to avoid invasive changes to Connection.execute().busybox --install -s
and the corresponding simplification in the codeI've updated the PR with:
If that does not work, I'll try another strategy without any killer_bg, but instead blocking on some stdin for each command (rather than a plain sleep). This way I should be able to unblock them easily.
Updated PR. Added:
--
to devlib.connection._kill_pgid_cmd
to avoid issues as wellStill to do:
bg.__exit__
is hanging on your setupFinally found the issue on my android device and it was due to the PID reporting of adb connections. Occasionally it would report the PID of the command used to find the PID rather than the PID of the command itself due to PID reuse. I've submitted a PR to fix this so can continue with on my android devices.
Very good news, so I'll try to shortly work on that PR to address the remaining things but I guess we don't have big blockers anymore so it shouldn't be a lot of work
PR updated:
Changes:
TODO:
@marcbonnici PR updated again with:
@marcbonnici PR updated with a 2nd attempt to close the adb background PID race: the command is frozen and the thawed using SIGSTOP/SIGCONT. It's not pretty but is a relatively minor change on top of the existing code.
@marcbonnici PR updated with a 2nd attempt to close the adb background PID race: the command is frozen and the thawed using SIGSTOP/SIGCONT. It's not pretty but is a relatively minor change on top of the existing code.
That's an interesting approach perhaps the way to go. Initially it seems to work well however occasionally something goes wrong when attempting to send the signals on my device.
while True:
bg = ta.background("ls")
print(bg.pid)
After a varying number of iterations it raises the following error:
625 find_pid = f'''pid=$({conn.busybox} ps -A -o pid,args | {conn.busybox} {grep_cmd} | {conn.busybox} grep -v {quote(grep_cmd)} | {conn.busybox} awk '{{print $1}}') && kill -CONT "$pid" && printf "%s" "$pid"'''
626 ps_out = conn.execute(find_pid)
--> 627 pid = int(ps_out)
628 return (p, pid)
ValueError: invalid literal for int() with base 10: '/system/bin/sh: kill: : arguments must be jobs or process IDs\n'
So look like for some reason it's failing to find the correct PID, have you seen anything like this in your testing?
PR updated with speedup of the async path when non-blocking calls are actually not required (when just one asyncio task is in use).
Wrt to the ps
output, I think we need to print the command and output (without the filtering of grep) to understand what is going on. Maybe kill
simply rejects PIDs that are no longer alive (and therefore not really a PID) ? I can't think of how that could happen though, since the shell process is frozen so it should not finish until released.
I've tried a few thousand iterations of your test case and it worked without issues on my side.
Also updated the PR to use busybox kill
instead of the system's one, maybe that will help ?
Re-updated the PR with kill and printf swapped in PID detection. That will allow seeing the value of $pid
even if kill
fails for some reason.
Thanks for the update, running the same test shows that no PID is being detected
624 # Find the PID and release the blocked background command with SIGCONT
625 find_pid = f'''pid=$({busybox} ps -A -o pid,args | {busybox} {grep_cmd} | {busybox} grep -v {quote(grep_cmd)} | {busybox} awk '{{print $1}}') && {busybox} printf "%s" "$pid" && {busybox} kill -CONT "$pid"'''
--> 626 pid = int(conn.execute(find_pid))
627 return (p, pid)
ValueError: invalid literal for int() with base 10: ''
That's interesting that you don't see the same problem, I'm seeing this problem on my Nexus 5 over an ADB connection, however I also tried it on a emulator which took a lot longer (left the above while loop running for a few minutes) but it did eventually hit the same error.
As you said I'll need to try printing the separate parts of the command to see where it is failing. I can only think that the command is exiting before the kill -STOP
has a chance to execute for some reason but I would need to investigate further.
PR update with the --
removed in the killer_bg command since it was causing issue for busybox kill
. Also now use busybox kill
rather than system's kill so that all platforms will behave the same.
Updated the PR with:
Target.get_connection()
that was not setting conn.busybox
on new connectionsPreviously, Target._execute_async()
was based on Target.background()
. This allowed sharing a single connection instance, but that also meant that it blocked until the command was spawned. Unfortunately, experiments have show that spawning a background command is quite expensive (e.g. on android where we end up with a blocking ps
). Following Amdahl's law and experiments, that greatly reduced the speedup that was achievable (e.g. x2 speedup for 8 concurrent commands for one of my Juno SSH setup).
Fortunately, this is solvable: instead of relying on Target.background()
, we can just create a thread pool with a separate connection instance each, and dispatch calls to these threads without blocking at all. This allows the best level of parallelism possible, and even avoids the inefficiencies of the Target.background()
API (e.g. the need to know the PID even if it's not actually used). That also somewhat simplifies the implementation and separates it from any background command implementation bug.
Using this implementation, I was able to achieve a 15s runtime reduction on a LISA kernel test that takes 1min30 with devlib's master branch (it consists of varied uses of the API, lots of sysfs reading and writing, one background() execution for a test workload and some file pushing/pulling).
Updated PR:
Target._execute_asyn()
FWIW we have been using this PR in Lisa for a month now (Lisa repo vendorizes devlib as a git subtree) and so far I haven't observed or heard of any issue related to that
@marcbonnici PR updated with:
connect(max_async)
cpufreq.use_governor()
racy file writeasync.asyncf()
decorator for async generators: async generators are now consumed completely when crossing a blocking boundary, rather than failing to await on them. We could maybe do something a bit smarter to preserve lazyness provided that we can asyncio.run() anywhere, so I'll have a look later.asyn.memoized_method()
decorator that works for both async and non-async code. This does not memoize non-hashable data so it will avoid issues described here: https://github.com/ARM-software/devlib/issues/341EDIT: Forgot to add the last bullet entry
PR re-updated with a lazy async generator blocking shim. This preserves the laziness of the async gen, made possible by nest_asyncio
package.
Add an async API to devlib to take advantage of concurrency more easily, both at a coarse and fine grain.
Note: This currently requires python 3.7
Based on https://github.com/ARM-software/devlib/pull/568