VirtualGL / virtualgl

Main VirtualGL repository
https://VirtualGL.org
Other
701 stars 106 forks source link

vglrun over vglconnect corrupts user XAUTHORITY file on CentOS 8 #163

Closed paulraines68 closed 3 years ago

paulraines68 commented 3 years ago

Everything below works fine when using CentOS7 machines. But on CentOS8 machines running vglrun over a vglconnect causes the XAUTHORITY to be corrupted in a way that you cannot even run X programs outside of vglrun

Setup is a CentOS8 box "sisu" running a TigerVNC session user connects to via Windows based UltraVNC viewer. Everything works fine on the 'sisu' box with or without vglrun. Then users does a 'vgconnect -s' to another CentOS8 box "storm" with VirtualGL installed. Running X apps without vglrun works fine, running with vglrun gives a X11 connection reject error, and after that one cannot run X apps outside of vglrun anymore. The cookie for the SSH forwarded X11 connection has been


sisu[0]:~$ cat /etc/redhat-release 
CentOS Stream release 8
sisu[0]:~$ rpm -q VirtualGL
VirtualGL-2.6.5-20201117.x86_64
sisu[0]:~$ xauth list $DISPLAY
#ffff#73746f726d2e6e6d722e6d67682e686172766172642e656475#:  MIT-MAGIC-COOKIE-1  31f3b680d30a79f33cfa507fbd755498
sisu.nmr.mgh.harvard.edu/unix:11  MIT-MAGIC-COOKIE-1  3904d2495955650b799185b25efca3ec
sisu[0]:~$ xdpyinfo | grep -A1 screen..0
screen #0:
  dimensions:    1440x900 pixels (381x238 millimeters)
sisu[0]:~$ xeyes
^C
sisu[0]:~$ vglrun xeyes
^C
sisu[0]:~$ DISPLAY=:0 XAUTHORITY=/etc/opt/VirtualGL/vgl_xauth_key xdpyinfo | grep -A1 screen..0
screen #0:
  dimensions:    7040x1600 pixels (1863x423 millimeters)
sisu[0]:~$ 
sisu[0]:~$ 
sisu[0]:~$ vglconnect -s storm

VirtualGL Client 64-bit v2.6.5 (Build 20201117)
vglclient is already running on this X display and accepting unencrypted
   connections on port 4242.

Making preliminary SSH connection to find a free port on the server ...
Making final SSH connection ...
storm[0]:raines$ cat /etc/redhat-release 
CentOS Linux release 8.2.2004 (Core) 
storm[0]:raines$ rpm -q VirtualGL
VirtualGL-2.6.5-20201117.x86_64
storm[0]:raines$ DISPLAY=:0 XAUTHORITY=/etc/opt/VirtualGL/vgl_xauth_key xdpyinfo | grep -A1 screen..0
screen #0:
  dimensions:    1920x1200 pixels (508x318 millimeters)
storm[0]:raines$ echo $DISPLAY
localhost:10.0
storm[0]:raines$ xauth list $DISPLAY
storm.nmr.mgh.harvard.edu/unix:10  MIT-MAGIC-COOKIE-1  323d5a935e4e9c408a6d16ea71f6765e
storm[0]:raines$ xauth -i -f /etc/opt/VirtualGL/vgl_xauth_key list
xauth:  /etc/opt/VirtualGL/vgl_xauth_key not writable, changes will be ignored
storm.nmr.mgh.harvard.edu/unix:  MIT-MAGIC-COOKIE-1  31f3b680d30a79f33cfa507fbd755498
#ffff#73746f726d2e6e6d722e6d67682e686172766172642e656475#:  MIT-MAGIC-COOKIE-1  31f3b680d30a79f33cfa507fbd755498
storm[0]:raines$ xeyes
^C
storm[0]:raines$ vglrun xeyes
X11 connection rejected because of wrong authentication.
Error: Can't open display: localhost:10.0
storm[0]:raines$ xeyes
X11 connection rejected because of wrong authentication.
Error: Can't open display: localhost:10.0
storm[0]:raines$ xauth list $DISPLAY
#ffff#73746f726d2e6e6d722e6d67682e686172766172642e656475#:  MIT-MAGIC-COOKIE-1  31f3b680d30a79f33cfa507fbd755498
storm.nmr.mgh.harvard.edu/unix:  MIT-MAGIC-COOKIE-1  31f3b680d30a79f33cfa507fbd755498
storm[0]:raines$ exit
logout
Connection to storm closed.
sisu[0]:~$ xeyes
^C

It seems to have removed the cookie from the forwarded SSH X11 connection and inserted the local :0 display cookie

Its actually worse that that. I had a cookie for a session on another machine named pinto that got removed too

My guess is some change in the behavior of the xauth tool in CentOS8 though both CentOS8 and CentOS7 package are based on xorg-x11-auth-1.0.9 with only a release difference (so difference in patches applied in CentOS8 version)

Looks like vglrun is doing a xauth merge so I will test just that:

sisu[0]:~$ vglconnect -s storm

VirtualGL Client 64-bit v2.6.5 (Build 20201117)
vglclient is already running on this X display and accepting unencrypted
   connections on port 4242.

Making preliminary SSH connection to find a free port on the server ...
Making final SSH connection ...
storm[0]:raines$ xeyes
^C
storm[0]:raines$ xauth -if /etc/opt/VirtualGL/vgl_xauth_key list
xauth:  /etc/opt/VirtualGL/vgl_xauth_key not writable, changes will be ignored
storm.nmr.mgh.harvard.edu/unix:  MIT-MAGIC-COOKIE-1  31f3b680d30a79f33cfa507fbd755498
#ffff#73746f726d2e6e6d722e6d67682e686172766172642e656475#:  MIT-MAGIC-COOKIE-1  31f3b680d30a79f33cfa507fbd755498
storm[0]:raines$ xauth merge /etc/opt/VirtualGL/vgl_xauth_key 
storm[0]:raines$ xeyes
X11 connection rejected because of wrong authentication.
Error: Can't open display: localhost:10.0

sisu[0]:~$ vglconnect -s squee

VirtualGL Client 64-bit v2.6.5 (Build 20201117)
vglclient is already running on this X display and accepting unencrypted
   connections on port 4242.

Making preliminary SSH connection to find a free port on the server ...
Making final SSH connection ...
squee[0]:raines$ cat /etc/redhat-release 
CentOS Linux release 7.9.2009 (Core)
squee[0]:raines$ xeyes
^C
squee[0]:~$ xauth -if /etc/opt/VirtualGL/vgl_xauth_key list
xauth:  /etc/opt/VirtualGL/vgl_xauth_key not writable, changes will be ignored
squee.nmr.mgh.harvard.edu/unix:0  MIT-MAGIC-COOKIE-1  e47f44d779b268aaab45f2ad0596f7c5
squee[0]:raines$ xauth merge /etc/opt/VirtualGL/vgl_xauth_key
squee[0]:raines$ xeyes
^C

Hmm. Why is the GDM xauth file so different on CentOS8

storm[0]:raines$ xauth -i -f /etc/opt/VirtualGL/vgl_xauth_key list
xauth:  /etc/opt/VirtualGL/vgl_xauth_key not writable, changes will be ignored
storm.nmr.mgh.harvard.edu/unix:  MIT-MAGIC-COOKIE-1  31f3b680d30a79f33cfa507fbd755498
#ffff#73746f726d2e6e6d722e6d67682e686172766172642e656475#:  MIT-MAGIC-COOKIE-1  31f3b680d30a79f33cfa507fbd755498
storm[0]:raines$ xauth add storm.nmr.mgh.harvard.edu/unix:  MIT-MAGIC-COOKIE-1  31f3b680d30a79f33cfa507fbd755498
xauth: (argv):1:  bad display name "storm.nmr.mgh.harvard.edu/unix:" in "add" command
storm[0]:raines$ xauth add storm.nmr.mgh.harvard.edu/unix:0  MIT-MAGIC-COOKIE-1  31f3b680d30a79f33cfa507fbd755498
storm[0]:raines$ DISPLAY=:0 xdpyinfo | head -3
name of display:    :0
version number:    11.0
vendor string:    The X.Org Foundation

Okay, so on storm I now do this

[root@storm ~]# cd /etc/opt/VirtualGL/
[root@storm VirtualGL]# mv vgl_xauth_key bad_key
[root@storm VirtualGL]# touch vgl_xauth_key
[root@storm VirtualGL]# xauth -f vgl_xauth_key add storm.nmr.mgh.harvard.edu/unix:0  MIT-MAGIC-COOKIE-1  31f3b680d30a79f33cfa507fbd755498
[root@storm VirtualGL]# chmod 644 vgl_xauth_key
[root@storm VirtualGL]# ls -l
total 8
-rw-r--r--. 1 root root 138 Mar 19 10:04 bad_key
-rw-r--r--. 1 root root  70 Apr  8 12:06 vgl_xauth_key

And now I can vglconnect to storm and run vglrun no problem

I will now modify my /usr/share/gdm/greeter/autostart/virtualgl.desktop to call a wrapper script that "fixes" the vgl_auth_key after running /usr/bin/vglgenkey

This missing ":0" is at the GDM level

[root@storm VirtualGL]# xauth -if /run/user/42/gdm/Xauthority list
storm.nmr.mgh.harvard.edu/unix:  MIT-MAGIC-COOKIE-1  31f3b680d30a79f33cfa507fbd755498
#ffff#73746f726d2e6e6d722e6d67682e686172766172642e656475#:  MIT-MAGIC-COOKIE-1  31f3b680d30a79f33cfa507fbd755498

so maybe this is a "bug" with GDM. But maybe vglgenkey could be adjusted to "fix" it when it sees it.

dcommander commented 3 years ago

I can't reproduce the issue, which is a necessary first step before I can fix it or work around it. What I did:

  1. cleared ~/.Xauthority on my CentOS 8 machine
  2. connected to the machine using vglconnect -s
  3. verified that ~/.Xauthority contained only the cookie for the SSH-forwarded 2D X server connection
  4. executed vglrun /opt/VirtualGL/bin/glxspheres64
  5. verified that GLXspheres launched correctly
  6. verified that ~/.Xauthority contained both the cookie for the SSH-forwarded 2D X server connection and the 3D X server connection

I also tried the same thing with a VNC session as the 2D X server, as you indicated above. Same results. Are you sharing a home directory between client and server? What else could I be missing?

paulraines68 commented 3 years ago

Yes, I am sharing a home directory on all the machines involved over NFS. I have nothing going on that would be writing to the ~/.Xauthority file at the same time ignoring locks. If locking was the issue, the issue would be more random but it is consistent.

If you xauth list the /etc/opt/VirtualGL/vgl_xauth_key on the remote machine running C8, do you see a /unix:0 or just /unix:

And after running vlgrun on the remote machine what is in your ~/.Xauthority exactly?

Here are the lines I put in a greeter script to replace /usr/bin/vglgenkey

/bin/rm -rf /etc/opt/VirtualGL/vgl_xauth_key
touch /etc/opt/VirtualGL/vgl_xauth_key
xauth -f /etc/opt/VirtualGL/vgl_xauth_key add $(xauth -f /run/user/$(id -u gdm)/gdm/Xauthority list | grep unix: | sed -e 's/unix:  MIT/unix:0  MIT/' )

This works fine for me on my machines.

paulraines68 commented 3 years ago

Sorry. This is all my fault. The /usr/bin/vglgenkey from 2.6.5 is working fine

The confusion is the X servers on these boxes have been running a long time and were already running X when VirtualGL was installed. So the /etc/opt/VirtualGL/vgl_xauth_key was just a by-hand copy of/run/user/42/gdm/Xauthority after I did the yum install so I would not have to have the user logout and back in. That was weeks ago and I forgot about that. It was an install procedure we used on CentOS7 for years with no suprises.

It works fine with vglrun locally so we did not notice this issue. Only now when someone did the vglconnect remote did it come up. Still not sure why it works locally but not remotely.

So vglgenkey was never actually run on the box like I assumed

I will need to redo my install prodecure so after the yum install I do:

# sudo -u gdm /bin/bash
bash-4.4$ DISPLAY=:0 XAUTHORITY=/run/user/42/gdm/Xauthority /usr/bin/vglgenkey