Open robertgzr opened 4 years ago
[roman-mazur] This issue has attached support thread https://jel.ly.fish/#/109ba44e-95d9-4566-a48b-1554f3c75e9e
ran some tests where disk usage is disabled (via https://github.com/balena-os/balena-engine/commit/c764e39cf511fd83ddca136fd6bf579180a67908)
in the following reports idle
refers to a device not running any containers beside the balena-supervisor, and running:
$ balena run --rm --log-driver none --network none balena-healthcheck-image > /dev/null
while busy
means the device was simultaneously pulling an image with an artificially bloated single 2.09 gigabyte layer.
optained via ctr -a /var/run/balena-engine/containerd/balena-engine-containerd.sock events
note the extra time spent on /task/create and /tasks/delete
events are published after the actions was completed
# on device
$ ctr pprof -d /var/run/balena-engine/containerd/balena-engine-containerd-debug.sock trace -s {20,50}s > pprof.trace.pb.gz
# on dev machine
$ go tool trace ./pprof.trace.pb.gz
containerd trace [idle]
containerd trace [busy]
# buid a static copy of strace for device
# replace shim symlink with a wrapper
$ cat /usr/bin/balena-engine-containerd-shim
#!/bin/sh
tmp=$(mktemp --tmpdir containerd-shim-XXXXX.strace.out)
/mnt/data/strace -o "${tmp}" /mnt/data/balena-engine-containerd-shim $@
# on device
$ ctr -a <addr> pprof -d <debug-addr> profile -s {20,50}s > profile.pb.gz
# on dev machine
$ go tool pprof -http=:8080 profile.pb.gz
containerd [idle]
containerd [busy]
raw traces/profiles are here
[gelbal] This issue has attached support thread https://jel.ly.fish/#/fb8929e2-e043-4205-8545-46ba00e49e4c
We often see issues with the host system when the engine is pushing/pulling images.
annotated logs of the balenaOS health check with a parallel pull [on a pi zero w]:
comparing the container execution times of docker vs containerd seems to indicate a performance penality of ~10s most likely as a result of the grpc calls between engine and containerd.
previous efforts
internal conversation