Closed melato closed 1 year ago
I've had similar trouble with some of the other listeners. In general, other than the cluster listener, we should be able to have the others issue a warning (both in log and warning API) and keep retrying in the background.
Could it be that LXD does not wait for lxdbr0 to initialize before attempting to start some listeners on it?
This problem does not happen with core.https_address. I set it to {lxdbr0-ipv4}:8443 and rebooted. lxc was operational. Using "netstat -nat" I saw that LXD started listening to {lxdbr0-ipv4}:8443 several seconds later. I used the same LXD version, as above.
Yeah listeners are started before networks. But adding retries would solve this also.
I had it with proxy config,
some_rpc:
connect: tcp:127.0.0.1:8545
listen: tcp:192.168.2.40:8545
type: proxy
After a host reboot, the private IP 192.168.2.40 vanished, as consequence, some containers rejected to start. I rather seen a more forgiving behavior than going through multiple containers' config and start manually.
I rather seen a more forgiving behavior than going through multiple containers' config and start manually.
We could introduce an optional
key on the proxy as is done with many other device types, but that may still cause issues as the container will then be allowed to start with that proxy device missing. LXD doesn't really have a way to keep track of every single little detail on your system, so it's not going to be practical for us to start monitoring IP addresses and port usage on the host system to then start instances.
I believe we already have a background retry logic for instances on startup, so if the instance still can't start after the 30s or whatever retry delay we have, then your system isn't likely to fix itself without human intervention at which point, the human in question can also start the affected instances.
Issue description
lxd fails to start after reboot when core.storage_buckets_address is set to a port on the private lxdbr0 ipv4 address.
Steps to reproduce
Error: Bind network address: listen tcp 10.91.97.1:8555: bind: cannot assign requested address
Fixed it like this: sudo sqlite3 /var/snap/lxd/common/lxd/database/local.db sqlite> DELETE FROM config WHERE key='core.storage_buckets_address';
sudo systemctl start snap.lxd.daemon
This was easily reproducible. After the fix I did it again:
and got the same problem after step 7 above.
Required information
Linux pin 5.10.0-21-arm64 #1 SMP Debian 5.10.162-1 (2023-01-21) aarch64 GNU/Linux
lxc info: config: core.https_address: :8443 api_extensions: