Closed lengau closed 1 year ago
This sounds like https://github.com/lxc/lxd/pull/11606 which was in lxd 5.14.
We are also using github runners now for all our tests (which heavily uses lxc exec) and have not seen this yet.
Is this happening containers and or vms? Are these self hosted github runners?
Is the exec output being piped into something?
@lengau Hi, are you able to test this on the latest/edge channel, as there were some changes to the websocket handler recently and just wanted to check incase that fixed it.
Sorry I missed your previous messages! These are on GitHub's regular hosted runners, which I believe are Azure VMs. The issue itself is happening with LXD containers.
The output is getting piped back into our Python process — essentially we're using subprocess.run
to execute the command lxc --project default exec local:<instance> -- cat /etc/os-release
and piping the output back into our Python process using capture_output=True
.
I haven't been able to reproduce on edge though, so it looks to me like it's fixed! Thank you!
Ah thats good to hear.
There were some changes to the websocket layer in edge that may have fixed it:
https://github.com/canonical/lxd/pull/12008 https://github.com/canonical/lxd/pull/11918
Required information
lxc info
It's also worth noting that this is occurring in GitHub hosted runners.
Issue description
In some situations (I believe heavy resource utilisation, though that's not all),
lxc exec
might not write tostdout
orstderr
.It still forwards the correct return code from the exec'd process, so I don't believe something is being killed. It's also particularly difficult to track down because it occurs far more frequently on GitHub CI than I've been able to reproduce on any of my machines.
Steps to reproduce
lxc exec my_instance -- cat /etc/os-release
in that instanceThis actions run from craft-providers shows an example of it happening. Logs:
logs_2796.zip
Workaround:
I've never been able to find a time when this happens reliably (even on GitHub actions it's a rare event), so the workaround is to just run the command again.
Information to attach
dmesg
)lxc info NAME --show-log
)lxc config show NAME --expanded
)lxc monitor
while reproducing the issue)