Open guerby opened 3 years ago
I'll setup a test VM with RHEL 7.9 to see if I can see what is going on... Normally, the disconnect is normal, because the agent is using the GDM uid/xauthority to capture the login screen, but once logs in, the child process exits, because it doesn't have the correct credentials to scrape the user session. Normally, I listen for an X event to monitor a user login event, to automatically respawn the child with the correct xauthority.
Probably the same report as on Reddit here.
@guerby is your RHEL system using KDE or GNOME?
@guerby is your RHEL system using KDE or GNOME?
GNOME.
I setup a vanilla RHEL 7.9 Server with Gnome. I can't reproduce this particular issue. Altho I do see that the resolution doesn't get updated when it logs in, so I can at least try to fix that... But in any case, I have around 70 different test systems, so I'll see if I can reproduce this on a different distro as I'm almost positive I've seen this before....
Looks like I can reproduce it with CentOS 7, so I'll do a bunch of testing to see if I can fix it
Note: our RHEL7.9 VM was highly loaded during my test (PhD student running some simulation code - still running as of now), may be CPU and/or I/O load changes some timinga for MeshAgent.
I think I figured it out.. Maybe you can verify if this scenario is the same for you... I found that this only happens if the user's desktop resolution is different than the resolution of the login screen. On my test setup, the login screen was 1920x1080, but the resolution of the desktop was 1680x1050. This made it so when you connected with kvm on the login screen, then login, it connects to the user's session at 1920x1080. It captures a few frames, but then the user's resolution changes to 1680x1050 when the workspace activates. I found when this happened, the X library exits the process with an error message about invalid parameters.... When I switched the desktop resolution to match the login screen resolution, it no longer caused the process to exit...
It seems on other distros, in this scenario, the X library does not emit an error and instead scales the output... I have code that registers for the workspace event, so that I can requery the resolution. This part should work, because this is how I draw the connection bar at the top of the screen, as I had to hook the same event, so that it renders properly. I just need to port that code back to C, since that code is written in JS right now, so that I can use it in the child KVM process.
MeshCentral v0.9.22 is out with Bryan's updated Linux agents. Let Bryan know if that solves it!
Unfortunately I tested with 0.9.22 and 0.9.26 (latest) and the issue is still there for me.
I checked the resolution and it doesn't change between gdm and gnome at 1024x768
I noticed "defunct" meshagent child process when it's not working, after disconnect/connect a new process is ok.
[root@vega2 ~]# more '/var/log/gdm/:0.log'
(II) modeset(0): Output Virtual-1 using initial mode 1024x768 +0+0
root 1218 0.0 0.0 482100 5208 ? Ssl Jul03 0:00 /usr/sbin/gdm
root 25055 22.1 0.0 473812 44856 tty1 Ssl+ 09:49 3:35 \_ /usr/bin/X :0 -background none -noreset -audit 4 -verbose -auth /run/gdm/auth-for-gdm-4V2jzX/database -seat seat0 -nolisten tcp vt1
root 26411 0.0 0.0 388144 5412 ? Sl 10:02 0:00 \_ gdm-session-worker [pam/gdm-password]
lguerby 26445 0.0 0.0 819296 9588 ? Ssl 10:02 0:00 \_ /usr/libexec/gnome-session-binary --session gnome-classic
root 22336 0.0 0.0 57448 19328 ? Ssl Sep10 1:26 /usr/local/mesh/meshagent --installedByUser=0
lguerby 23429 0.0 0.0 0 0 ? Z 09:44 0:00 \_ [meshagent] <defunct>
root 22336 0.0 0.0 60988 22740 ? Ssl Sep10 1:26 /usr/local/mesh/meshagent --installedByUser=0
root 25106 2.7 0.0 74028 25304 ? S 09:49 0:20 \_ /usr/local/mesh/meshagent --installedByUser=0
root 22336 0.0 0.0 61784 23544 ? Ssl Sep10 1:26 /usr/local/mesh/meshagent --installedByUser=0
lguerby 26579 0.0 0.0 0 0 ? Z 10:02 0:00 \_ [meshagent] <defunct>
root 22336 0.0 0.0 62848 24704 ? Ssl Sep10 1:26 /usr/local/mesh/meshagent --installedByUser=0
lguerby 27729 2.7 0.0 73944 25116 ? R 10:05 0:00 \_ /usr/local/mesh/meshagent --installedByUser=0
I will test soon with an unloaded RHEL 7 VM to see if the heavy simulation process (still running) have an effect.
One thing you can try is to run the following command in the console tab:
eval "require('user-sessions').on('changed', function(){sendConsoleText(require('user-sessions').consoleUid());});"
This will say cyclic error when it returns, but that is ok... This will make it so it displays login/logout events, and should show either the GDM uid, or the user uid, depending if the console is logged in or not. Let me know if this properly detects logging in from the login screen.
It loggued "1002" which is the user id of lguerby (the login I use on this VM)
And 42 (gdm) when logout (and meshcentral successfully got back to GDM screen by itself)
Tested today with 0.9.58 and I still have KVM child exit message.
Interesting, thanks. I'll have to do more testing to see how I can coerce the race to happen.
Here is a list of "KVM Child process has unexpectedly exited" issues for cross referencing.
Using meshcentral 0.9.20 when controlling a RHEL 7.9 VM once having entered login and password in GDM the meshcentral window show in yellow on black "KVM child process has unexpectedly exited".
meshcentral desktop disconnect then reconnect allows to see the desktop post login so there's a workaround (but not friendly to non power users).
systemctl status meshagent on the RHEL 7.9 VM show no issue logged.
I tested GDM login with debian 10 and RHEL 8.3 and no such issue occurs so it must be specific to RHEL 7.9
Note: RHEL is now free to install up to some number of machine/VM.