canonical / microcloud

Automated private cloud based on LXD, Ceph and OVN
https://microcloud.is
GNU Affero General Public License v3.0
275 stars 42 forks source link

Bandaid for `lxc exec` lockup #301

Closed masnax closed 5 months ago

masnax commented 5 months ago

If the test suite is run with SNAPSHOT_RESTORE=0 and CONCURRENT_SETUP=1 then lxc exec can occasionally get stuck and never return if the command being executed is fast enough.

This adds a workaround by just sleeping for 1s before such commands, but ideally we should address this in LXD.

masnax commented 5 months ago

@MusicDin might this be related to the other lxc exec lockup issues?

masnax commented 5 months ago

@simondeziel Looks like the same LXD version mismatch happening over here. I guess --cohort=+ isn't working?

tomponline commented 5 months ago

but ideally we should address this in LXD

Did you open a bug already for this?

masnax commented 5 months ago

but ideally we should address this in LXD

Did you open a bug already for this?

I haven't yet, thanks for the reminder.

simondeziel commented 5 months ago

The error message The joining server version doesn't match (expected 5.21.1 with API count 386) could be a bit more informative if it included the version/API it got that didn't match.

Speaking of cohort, this reminds me there is this unexpected (to me) refresh of LXD. I'm not seeing why a refresh would be needed there and also without a channel being specified.

masnax commented 5 months ago

@tomponline The corresponding issue is here: https://github.com/canonical/lxd/issues/13425

@simondeziel Yeah that refresh looks redundant.

simondeziel commented 5 months ago

@simondeziel Yeah that refresh looks redundant.

Good, can you drop it in this bandaid PR? If not, I'm happy to do it in a separated one.

masnax commented 5 months ago

@simondeziel Yeah that refresh looks redundant.

Good, can you drop it in this bandaid PR? If not, I'm happy to do it in a separated one.

I added it to #300

masnax commented 5 months ago

Closing as I've narrowed this down to the specific containers I set up back in September. Not sure what's up with the container, but we don't need this PR anymore.