canonical / microcloud

Automated private cloud based on LXD, Ceph and OVN
https://microcloud.is
GNU Affero General Public License v3.0
260 stars 36 forks source link

Optimize for speed #298

Closed simondeziel closed 2 months ago

simondeziel commented 2 months ago

Here are some boot time and memory usage for the 2nd boot (1st one being ignored due to cloud-init does its thing) of a guest VM.

From:

# systemd-analyze 
Startup finished in 780ms (kernel) + 7.248s (userspace) = 8.029s 
graphical.target reached after 5.743s in userspace
root@u22:~# free -mt
               total        used        free      shared  buff/cache   available
Mem:            3932          94        3614          16         222        3777
Swap:              0           0           0
Total:          3932          94        3614

To:

# systemd-analyze 
Startup finished in 522ms (kernel) + 2.729s (userspace) = 3.252s 
graphical.target reached after 2.709s in userspace
# free -mt
               total        used        free      shared  buff/cache   available
Mem:            3932          82        3674          16         175        3792
Swap:              0           0           0
Total:          3932          82        3674

This helps speeding up the boot of the nested micro0X VMs even if the total test runtime seems only slightly reduced.

The biggest test runtime gain is from test/includes/microcloud: don't start VMs after snapshot, reset_systems will's commit that speeds up the setup phase (~9.5 minutes down to ~7.5 minutes).

simondeziel commented 2 months ago

@masnax I think you've been working in this area, any idea why those tests now all fail early with:

Error: System "micro02" failed to join the cluster: Failed to update cluster status of services: Failed to join "LXD" cluster: Failed to configure cluster :Failed request to add member: The joining server version doesn't match (expected 5.21.1 with API count 385)
simondeziel commented 2 months ago

@masnax I think you've been working in this area, any idea why those tests now all fail early with:

Error: System "micro02" failed to join the cluster: Failed to update cluster status of services: Failed to join "LXD" cluster: Failed to configure cluster :Failed request to add member: The joining server version doesn't match (expected 5.21.1 with API count 385)

Was a one time hiccup it seems. Sorry for the ping.

simondeziel commented 2 months ago

Yay, it seems the longest test consistently finishes in around 3.5h. Not great when taken at face value but it used to hit the timeout limit of 6 hours pretty often before, for me at least.