canonical / multipass

Multipass orchestrates virtual Ubuntu instances
https://multipass.run
GNU General Public License v3.0
7.51k stars 632 forks source link

Cannot checkout running instance with error: Could not determine IP address within 120000ms #3528

Open manztihagi opened 1 month ago

manztihagi commented 1 month ago

Describe the bug Cannot checkout running instance and showing error Could not determine IP address within 120000ms. If I trying more than one, the all the multipass parameter not showing anything infinitely.

image

To Reproduce Just running on command shell: multipass list

Expected behavior Showing the running instance

Logs Cannot get the logs with showing message:

image

Additional info

image
manztihagi commented 2 weeks ago

Additional information:

multipass networks image

ricab commented 2 weeks ago

Hi @manztihagi, sorry for the delay, this slipped before.

The error you are seeing shows that Multipass is unable to get an IP from LXD. Did you stop or otherwise modify the VM via LXD directly? Could you please send the output of the following commands?

Logs would also be useful. Have you tried running journalctl with sudo? And do you have any firewall or other software that could be interfering with communication between Multipass and LXD?

manztihagi commented 2 weeks ago

Hi @ricab,

  1. Did you stop or otherwise modify the VM via LXD directly => No

  2. Output following command as per requested:

    image
  3. Firewall between Multipass and LXD => only apply iptables to forward specific ports applications

    image
  4. The log files => 3Mb files, so we have to upload to another media, did you have opinion ?

ricab commented 2 weeks ago

Hmm, for some reason LXD isn't listing an IPv4 lease for chr-uat, even though an IP s listed in lxc list. Did you configure the instance to use a static IP or something?

It looks like you have some DOCKER chains in iptables, along with rules denying SSH. Does the situation improve if you disable docker on your host entirely? And if you disable the firewall? You can do that with sudo ufw disable if you are using ufw. Otherwise, you can try something like this to take iptables' hands temporarily off of traffic.

Regarding logs, we wouldn't need all those 3Mb. You could just:

  1. multipass stop <instance>
  2. In another terminal, run sudo journalctl -a -b0 -f -u snap.multipass.*
  3. multipass start <instance> (in the original terminal)
  4. multipass list (in the original terminal)
  5. Paste the log you got on the terminal in step 2 into a file and attach that file here
manztihagi commented 2 weeks ago

I've static IP's for host in my network so from this static IP's forwarded into DHCP in multipass instance.

  1. logs for journalctl

    chr@chr:~$ sudo journalctl -a -b0 -f -u snap.multipass.*
    [sudo] password for chr:
    Jun 20 06:50:16 chr multipassd[156964]: fetch manifest periodically
    Jun 20 07:05:16 chr multipassd[156964]: fetch manifest periodically
    Jun 20 07:20:16 chr multipassd[156964]: fetch manifest periodically
    Jun 20 07:35:16 chr multipassd[156964]: fetch manifest periodically
    Jun 20 07:50:16 chr multipassd[156964]: fetch manifest periodically
    Jun 20 08:05:16 chr multipassd[156964]: fetch manifest periodically
    Jun 20 08:20:16 chr multipassd[156964]: fetch manifest periodically
    Jun 20 08:35:16 chr multipassd[156964]: fetch manifest periodically
    Jun 20 08:50:16 chr multipassd[156964]: fetch manifest periodically
    Jun 20 09:02:51 chr multipassd[156964]: Cannot open ssh session on "chr-uat" shutdown: Could not determine IP address within 120000ms
    Jun 20 09:04:30 chr multipassd[156964]: Using the 'multipass' storage pool.
    Jun 20 09:04:31 chr multipassd[156964]: Waiting for SSH to be up
    Jun 20 09:05:16 chr multipassd[156964]: fetch manifest periodically
  2. Stopped multipass instance chr@chr:/etc/iptables$ multipass stop chr-uat

  3. clean it up iptables including dockers:

chr@chr:/etc/iptables$ sudo iptables -F
chr@chr:/etc/iptables$ sudo iptables -X
chr@chr:/etc/iptables$ sudo iptables -P INPUT ACCEPT
chr@chr:/etc/iptables$ sudo iptables -P OUTPUT ACCEPT
chr@chr:/etc/iptables$ sudo iptables -P FORWARD ACCEPT
chr@chr:/etc/iptables$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
  1. Multipass instance stopped but cannot starting anymore (have to restart the whole system hosts) 1st attempt: Failed to run the instance after restart whole system hosts. 2nd attempt: After removing iptables rule to block default ssh port 22, chr-uat instance can restart smoothly. When the instance was running and adding iptables to block default ssh port, restarting system hosts doesn't effect to chr-uat instance. But for multipass list still same produce the error.

Multipass starting the instances

chr@chr:/etc/iptables$ multipass start chr-uat
start failed: The following errors occurred:
chr-uat: timed out waiting for response
chr@chr:/etc/iptables$ multipass list
list failed: Could not determine IP address within 120000ms
chr@chr:/etc/iptables$
ricab commented 2 weeks ago

Hi @manztihagi, thanks for all the details. Disabling all iptables rules couldn't work because some of them are indeed needed, sorry to mislead you there. But Docker sets some aggressive rules that interfere with LXD, so does the situation improve if you disable docker on your host entirely?

manztihagi commented 1 week ago

Hi @ricab, thanks for the response.

I'm already remove all of docker's related including package and iptables-rules after restarting the entire system host, multipass still can't get it for the list.

Is there any specific ports listening between instance and host must be allowed on iptables rule ?

ricab commented 1 week ago

No manual rules are required, unless other custom rules are affecting traffic. Can you please confirm if you still get no lease from lxc --project=multipass network list-leases mpbr0 after docker is gone? And is this something that happens only with that one instance, or any instance you try?

manztihagi commented 1 week ago

Docker already gone away, but if we execute lxc --project=multipass list, docker0 still appear.

chr@chr:~$ sudo lxc --project=multipass network list-leases mpbr0
+------------+-------------------+---------------------------------------+---------+
|  HOSTNAME  |    MAC ADDRESS    |              IP ADDRESS               |  TYPE   |
+------------+-------------------+---------------------------------------+---------+
| chr-uat    | 52:54:00:11:13:bf | fd42:d158:353c:6900:5054:ff:fe11:13bf | DYNAMIC |
+------------+-------------------+---------------------------------------+---------+
| chr-uatweb | 52:54:00:fc:87:37 | fd42:d158:353c:6900:5054:ff:fefc:8737 | DYNAMIC |
+------------+-------------------+---------------------------------------+---------+
chr@chr:~$ sudo lxc --project=multipass list
+------------+---------+------------------------------+------------------------------------------------+-----------------+-----------+
|    NAME    |  STATE  |             IPV4             |                      IPV6                      |      TYPE       | SNAPSHOTS |
+------------+---------+------------------------------+------------------------------------------------+-----------------+-----------+
| chr-uat    | RUNNING | 172.18.0.1 (br-ee3a547b610b) | fd42:d158:353c:6900:5054:ff:fe11:13bf (enp5s0) | VIRTUAL-MACHINE | 0         |
|            |         | 172.17.0.1 (docker0)         |                                                |                 |           |
|            |         | 10.67.90.39 (enp5s0)         |                                                |                 |           |
+------------+---------+------------------------------+------------------------------------------------+-----------------+-----------+
| chr-uatweb | RUNNING | 10.67.90.144 (enp5s0)        | fd42:d158:353c:6900:5054:ff:fefc:8737 (enp5s0) | VIRTUAL-MACHINE | 0         |
+------------+---------+------------------------------+------------------------------------------------+-----------------+-----------+
chr@chr:~$
ricab commented 1 week ago

Hmm, OK. The stuff that docker left behind may well be why things aren't working. It is known to interfere with LXD, as per the link above, and we still can't get a lease. I suggest you try to go to LXD for support — they would be better equipped to help you figure out why you don't see a lease. But please let us know of any findings!

manztihagi commented 1 week ago

Ok @ricab ,

I'm also trying to create new container with command:

chr@chr:~$ sudo multipass launch nobel -n chr-uat2 --cpus 4 --memory 8G --disk 100G --cloud-init cloud-init.yaml

and successful without any problem, but options network are removed, so the container automatically get the IP and can get in using multipass shell chr-uat2

here the results:

chr@chr:~$ sudo lxc list --project=multipass
+------------+---------+------------------------------+------------------------------------------------+-----------------+-----------+
|    NAME    |  STATE  |             IPV4             |                      IPV6                      |      TYPE       | SNAPSHOTS |
+------------+---------+------------------------------+------------------------------------------------+-----------------+-----------+
| chr-uat    | RUNNING | 172.18.0.1 (br-ee3a547b610b) | fd42:d158:353c:6900:5054:ff:fe11:13bf (enp5s0) | VIRTUAL-MACHINE | 0         |
|            |         | 172.17.0.1 (docker0)         |                                                |                 |           |
|            |         | 10.67.90.39 (enp5s0)         |                                                |                 |           |
+------------+---------+------------------------------+------------------------------------------------+-----------------+-----------+
| chr-uat2   | RUNNING | 10.67.90.103 (enp5s0)        | fd42:d158:353c:6900:5054:ff:fec8:1c57 (enp5s0) | VIRTUAL-MACHINE | 0         |
+------------+---------+------------------------------+------------------------------------------------+-----------------+-----------+
| chr-uatweb | RUNNING | 10.67.90.144 (enp5s0)        | fd42:d158:353c:6900:5054:ff:fefc:8737 (enp5s0) | VIRTUAL-MACHINE | 0         |
+------------+---------+------------------------------+------------------------------------------------+-----------------+-----------+
chr@chr:~$ multipass shell chr-uat2
Welcome to Ubuntu 24.04 LTS (GNU/Linux 6.8.0-35-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/pro

 System information as of Wed Jun 26 07:19:57 WIB 2024

  System load:             0.0
  Usage of /:              1.9% of 95.82GB
  Memory usage:            3%
  Swap usage:              0%
  Processes:               179
  Users logged in:         0
  IPv4 address for enp5s0: 10.67.90.103
  IPv6 address for enp5s0: fd42:d158:353c:6900:5054:ff:fec8:1c57

 * Strictly confined Kubernetes makes edge and IoT secure. Learn how MicroK8s
   just raised the bar for easy, resilient and secure K8s cluster deployment.

   https://ubuntu.com/engage/secure-kubernetes-at-the-edge

Expanded Security Maintenance for Applications is not enabled.

0 updates can be applied immediately.

Enable ESM Apps to receive additional future security updates.
See https://ubuntu.com/esm or run: sudo pro status

Last login: Tue Jun 25 14:51:23 2024 from 10.62.171.22
ubuntu@chr-uat2:~$
manztihagi commented 6 days ago

Additional Information:

docker0 and br-ee3a547b610b has been gone after full restart.

But for multipass list still can't get in

chr@chr:~$ sudo lxc list --project=multipass
[sudo] password for chr:
+------------+---------+-----------------------+------------------------------------------------+-----------------+-----------+
|    NAME    |  STATE  |         IPV4          |                      IPV6                      |      TYPE       | SNAPSHOTS |
+------------+---------+-----------------------+------------------------------------------------+-----------------+-----------+
| chr-uat    | RUNNING | 10.67.90.39 (enp5s0)  | fd42:d158:353c:6900:5054:ff:fe11:13bf (enp5s0) | VIRTUAL-MACHINE | 0         |
+------------+---------+-----------------------+------------------------------------------------+-----------------+-----------+
| chr-uat2   | RUNNING | 10.67.90.103 (enp5s0) | fd42:d158:353c:6900:5054:ff:fec8:1c57 (enp5s0) | VIRTUAL-MACHINE | 0         |
+------------+---------+-----------------------+------------------------------------------------+-----------------+-----------+
| chr-uatweb | STOPPED |                       |                                                | VIRTUAL-MACHINE | 0         |
+------------+---------+-----------------------+------------------------------------------------+-----------------+-----------+
chr@chr:~$ multipass list
list failed: Could not determine IP address within 120000ms
chr@chr:~$

network leases of mpbr0

chr@chr:~$ sudo lxc --project=multipass network list-leases mpbr0
+------------+-------------------+---------------------------------------+---------+
|  HOSTNAME  |    MAC ADDRESS    |              IP ADDRESS               |  TYPE   |
+------------+-------------------+---------------------------------------+---------+
| chr-uat    | 52:54:00:11:13:bf | fd42:d158:353c:6900:5054:ff:fe11:13bf | DYNAMIC |
+------------+-------------------+---------------------------------------+---------+
| chr-uat2   | 52:54:00:c8:1c:57 | 10.67.90.103                          | DYNAMIC |
+------------+-------------------+---------------------------------------+---------+
| chr-uat2   | 52:54:00:c8:1c:57 | fd42:d158:353c:6900:5054:ff:fec8:1c57 | DYNAMIC |
+------------+-------------------+---------------------------------------+---------+
| chr-uatweb | 52:54:00:fc:87:37 | fd42:d158:353c:6900:5054:ff:fefc:8737 | DYNAMIC |
+------------+-------------------+---------------------------------------+---------+
chr@chr:~$

and log not much helpful

chr@chr:~$ sudo journalctl -a -b0 -f -u snap.multipass.*
Jun 28 09:55:09 chr multipassd[1049]: Using the 'multipass' storage pool.
Jun 28 09:55:09 chr multipassd[1049]: chr-uat2 needs starting. Starting now...
Jun 28 09:55:11 chr multipassd[1049]: Waiting for SSH to be up
Jun 28 09:55:12 chr multipassd[1049]: Starting Multipass 1.13.1
Jun 28 09:55:12 chr multipassd[1049]: Daemon arguments: /snap/multipass/12710/bin/multipassd --verbosity debug --logger platform
ricab commented 6 days ago

Hi again @manztihagi. To list instances, Multipass tries to obtain their IPs. With the LXD driver, that is done with the same API that lxc list-leases uses. For some reason, LXD is unable to return the lease for that VM. I suggest you create a new issue with LXD linking back to this one. Since Docker was involved, there's a chance that they know what the issue/solution is right away.

As far as Multipass is concerned, we already retry asking for the lease repeatedly, until that timeout occurs. We could probably reduce that timeout a little or skip the faulty instance and list the rest, but the underlying issue would remain.