GNS3 / gns3-server

GNS3 server
GNU General Public License v3.0
812 stars 263 forks source link

tigervnc-server and 2 container docker with vnc #2276

Closed Raizo62 closed 1 year ago

Raizo62 commented 1 year ago

Hi


[gns3_server.log](https://github.com/GNS3/gns3-gui/files/11562166/gns3_server.log)
[gns3_gui.log](https://github.com/GNS3/gns3-gui/files/11562167/gns3_gui.log)
grossmj commented 1 year ago

I see this error:

2023-05-25 07:23:37 ERROR route.py:221 Node error detected: DockerError
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/gns3server/compute/docker/docker_vm.py", line 651, in _start_vnc
    await wait_for_file_creation(x11_socket)
  File "/usr/local/lib/python3.9/dist-packages/gns3server/utils/asyncio/__init__.py", line 137, in wait_for_file_creation
    raise asyncio.TimeoutError()
asyncio.exceptions.TimeoutError

Does your device take a long time to boot?

Also, what do you see when you type ls /tmp/.X11-unix/X* in a terminal? (before and after you try to start the second device)

Thanks.

Raizo62 commented 1 year ago

Does your device take a long time to boot?

no, not really....

Also, what do you see when you type ls /tmp/.X11-unix/X* in a terminal? (before and after you try to start the second device)

Raizo62 commented 1 year ago

I believe that i know where is the problem : the starting command of the device is "/usr/sbin/init"

ghost commented 1 year ago

While the VM from Raizo62 might be a bit unusual in deleting the X11 sockets in /tmp/.X11-unix, but this should not influence the file system of the GNS3 server. But it does. I started a docker VM with VNC console, deleted /tmp/.X11-unix/X100 and noticed that this file is also gone in the file system of the GNS3 server.

To overcome this, I changed the GNS3 server to mount the /tmp/.X11-unix as readonly. Then the docker VM can't change it. But that will prohibit software within the VM (for example a Xvfb server) to create an additional X11 socket, so this change might have some unwanted side effects.

Here a log from my firefox docker VM with this change applied:

root@firefox-1:~# mount | grep X11
/dev/sda4 on /tmp/.X11-unix type ext4 (ro,relatime,errors=remount-ro)
root@firefox-1:~# ls -la /tmp/.X11-unix/
total 8
drwxrwxrwt 2 root root 4096 May 28 18:26 .
drwxrwxrwt 1 root root 4096 May 28 18:26 ..
srwxrwxrwx 1 root root    0 May 28 08:31 X0
srwxrwxrwx 1 1000 1000    0 May 28 18:26 X100
root@firefox-1:~# rm /tmp/.X11-unix/X100
rm: cannot remove '/tmp/.X11-unix/X100': Read-only file system
root@firefox-1:~# 

The changes in the GNS3 server v2.x (3.x uses a slightly different way to bind mount):

diff --git a/gns3server/compute/docker/docker_vm.py b/gns3server/compute/docker/docker_vm.py
index 79bdb128..91a5a58c 100644
--- a/gns3server/compute/docker/docker_vm.py
+++ b/gns3server/compute/docker/docker_vm.py
@@ -407,7 +407,7 @@ class DockerVM(BaseNode):
             await self._start_vnc()
             params["Env"].append("QT_GRAPHICSSYSTEM=native")  # To fix a Qt issue: https://github.com/GNS3/gns3-server/issues/556
             params["Env"].append("DISPLAY=:{}".format(self._display))
-            params["HostConfig"]["Binds"].append("/tmp/.X11-unix/:/tmp/.X11-unix/")
+            params["HostConfig"]["Binds"].append("/tmp/.X11-unix/:/tmp/.X11-unix/:ro")

         if self._extra_hosts:
             extra_hosts = self._format_extra_hosts(self._extra_hosts)
ghost commented 1 year ago

Another test shows, that the separation of docker processes using VNC is quite poor. All docker VMs share the same /tmp/.X11-unix directory, so by setting the DISPLAY variable to that of a different VM, a VM has full X11 access to that VM. For example it can open windows in the display of another VM.

So I think the sharing of the X11 socket needs a redesign. But that's certainly nothing for 2.2, maybe even not for 3.0.

grossmj commented 1 year ago

So I think the sharing of the X11 socket needs a redesign. But that's certainly nothing for 2.2, maybe even not for 3.0.

I agree, this needs a redesign. Most likely for v3.1 or later.

grossmj commented 1 year ago

I believe that i know where is the problem : the starting command of the device is "/usr/sbin/init"

Have you tried with another starting command?

Raizo62 commented 1 year ago

I believe that i know where is the problem : the starting command of the device is "/usr/sbin/init"

Have you tried with another starting command?

Yes. With nothing, i have not the bug. But it's not a necessity for me. To test VNC on docker, I just used this template which used systemd. When I saw the problem, I reported it.

ghost commented 1 year ago

The issues that a Docker VM can remove the X11 socket of the host and that a Docker VM has access to all X11 host sockets can be solved by only exporting the specific X11 socket to the Docker VM.

diff --git a/gns3server/compute/docker/docker_vm.py b/gns3server/compute/docker/docker_vm.py
index a10312e3..500e526d 100644
--- a/gns3server/compute/docker/docker_vm.py
+++ b/gns3server/compute/docker/docker_vm.py
@@ -406,7 +406,7 @@ class DockerVM(BaseNode):
             await self._start_vnc()
             params["Env"].append("QT_GRAPHICSSYSTEM=native")  # To fix a Qt issue: https://github.com/GNS3/gns3-server/issues/556
             params["Env"].append("DISPLAY=:{}".format(self._display))
-            params["HostConfig"]["Binds"].append("/tmp/.X11-unix/:/tmp/.X11-unix/")
+            params["HostConfig"]["Binds"].append("/tmp/.X11-unix/X{0}:/tmp/.X11-unix/X{0}:ro".format(self._display))

         if self._extra_hosts:
             extra_hosts = self._format_extra_hosts(self._extra_hosts)

The readonly flag (":ro") prevents, that the socket file gets modified from the docker VM. Indeed, when trying the remove that socket file in Docker, it is refused by a "resource busy" error. But also without that flag, removing the socket file fails. So this readonly flag (":ro") might be superfluous.

Maybe @Raizo62 can test this small change, if it works also in his environment.

ghost commented 1 year ago

Just verified it with Live Raizo v14.23.06.28p

Short result: My change from the previous comment works.

Longer result:

I can replicate it by booting Live Raizo, then changing the console in the DDebian template to vnc, then adding the first DDebian VM, start it, then adding the second one.

After starting the first DDebian VM, not only the X11 socket of this VM (/tmp/.X11-unix/X100) is deleted on the host, but all X11 sockets of the host get deleted. So even the hosts /tmp/.X11-unix/X0 is gone, what will result in later issues on the host, as applications don't find the X11 socket.

Adding a second DDebian VM will fail because it allocates the lowest X11 socket, whose number is at least 100. As the X11 socket X100 from the first VM got deleted, it again chooses X100. This conflict leads to the error.

I then rebooted Live Raizo for a fresh start and then integrated my change by directly editing /usr/local/lib/python3.9/dist-packages/gns3server/compute/docker/docker_vm.py (only use that for testing). Then I started GNS3, changed the console in the DDebian template to vnc, then added the first DDebian VM and started it. This time the X11 sockets were not deleted, neither X100 nor X0. Adding a second DDebian VM works without issues, it correctly chooses the x11 socket X101.

Raizo62 commented 1 year ago

Maybe @Raizo62 can test this small change, if it works also in his environment.

Sorry i was offline it's last 2 days

Just verified it with Live Raizo v14.23.06.28p

Thank you :-)

grossmj commented 1 year ago

Thanks @b-ehlers, this is now merged on the 2.2 branch