Ylianst / MeshCentral

A complete web-based remote monitoring and management web site. Once setup you can install agents and perform remote desktop session to devices on the local network or over the Internet.
https://meshcentral.com
Apache License 2.0
4.03k stars 543 forks source link

Meshcentral yellow message KVM child process has unexpectedly exited when GDM login in on RHEL 7.9 #3114

Open guerby opened 3 years ago

guerby commented 3 years ago

Using meshcentral 0.9.20 when controlling a RHEL 7.9 VM once having entered login and password in GDM the meshcentral window show in yellow on black "KVM child process has unexpectedly exited".

meshcentral desktop disconnect then reconnect allows to see the desktop post login so there's a workaround (but not friendly to non power users).

systemctl status meshagent on the RHEL 7.9 VM show no issue logged.

I tested GDM login with debian 10 and RHEL 8.3 and no such issue occurs so it must be specific to RHEL 7.9

Note: RHEL is now free to install up to some number of machine/VM.

krayon007 commented 3 years ago

I'll setup a test VM with RHEL 7.9 to see if I can see what is going on... Normally, the disconnect is normal, because the agent is using the GDM uid/xauthority to capture the login screen, but once logs in, the child process exits, because it doesn't have the correct credentials to scrape the user session. Normally, I listen for an X event to monitor a user login event, to automatically respawn the child with the correct xauthority.

Ylianst commented 3 years ago

Probably the same report as on Reddit here.

krayon007 commented 3 years ago

@guerby is your RHEL system using KDE or GNOME?

guerby commented 3 years ago

@guerby is your RHEL system using KDE or GNOME?

GNOME.

krayon007 commented 3 years ago

I setup a vanilla RHEL 7.9 Server with Gnome. I can't reproduce this particular issue. Altho I do see that the resolution doesn't get updated when it logs in, so I can at least try to fix that... But in any case, I have around 70 different test systems, so I'll see if I can reproduce this on a different distro as I'm almost positive I've seen this before....

krayon007 commented 3 years ago

Looks like I can reproduce it with CentOS 7, so I'll do a bunch of testing to see if I can fix it

guerby commented 3 years ago

Note: our RHEL7.9 VM was highly loaded during my test (PhD student running some simulation code - still running as of now), may be CPU and/or I/O load changes some timinga for MeshAgent.

krayon007 commented 3 years ago

I think I figured it out.. Maybe you can verify if this scenario is the same for you... I found that this only happens if the user's desktop resolution is different than the resolution of the login screen. On my test setup, the login screen was 1920x1080, but the resolution of the desktop was 1680x1050. This made it so when you connected with kvm on the login screen, then login, it connects to the user's session at 1920x1080. It captures a few frames, but then the user's resolution changes to 1680x1050 when the workspace activates. I found when this happened, the X library exits the process with an error message about invalid parameters.... When I switched the desktop resolution to match the login screen resolution, it no longer caused the process to exit...

It seems on other distros, in this scenario, the X library does not emit an error and instead scales the output... I have code that registers for the workspace event, so that I can requery the resolution. This part should work, because this is how I draw the connection bar at the top of the screen, as I had to hook the same event, so that it renders properly. I just need to port that code back to C, since that code is written in JS right now, so that I can use it in the child KVM process.

Ylianst commented 3 years ago

MeshCentral v0.9.22 is out with Bryan's updated Linux agents. Let Bryan know if that solves it!

guerby commented 3 years ago

Unfortunately I tested with 0.9.22 and 0.9.26 (latest) and the issue is still there for me.

I checked the resolution and it doesn't change between gdm and gnome at 1024x768

I noticed "defunct" meshagent child process when it's not working, after disconnect/connect a new process is ok.

[root@vega2 ~]# more '/var/log/gdm/:0.log'
(II) modeset(0): Output Virtual-1 using initial mode 1024x768 +0+0

root      1218  0.0  0.0 482100  5208 ?        Ssl  Jul03   0:00 /usr/sbin/gdm
root     25055 22.1  0.0 473812 44856 tty1     Ssl+ 09:49   3:35  \_ /usr/bin/X :0 -background none -noreset -audit 4 -verbose -auth /run/gdm/auth-for-gdm-4V2jzX/database -seat seat0 -nolisten tcp vt1
root     26411  0.0  0.0 388144  5412 ?        Sl   10:02   0:00  \_ gdm-session-worker [pam/gdm-password]
lguerby  26445  0.0  0.0 819296  9588 ?        Ssl  10:02   0:00      \_ /usr/libexec/gnome-session-binary --session gnome-classic

root     22336  0.0  0.0  57448 19328 ?        Ssl  Sep10   1:26 /usr/local/mesh/meshagent --installedByUser=0
lguerby  23429  0.0  0.0      0     0 ?        Z    09:44   0:00  \_ [meshagent] <defunct>

root     22336  0.0  0.0  60988 22740 ?        Ssl  Sep10   1:26 /usr/local/mesh/meshagent --installedByUser=0
root     25106  2.7  0.0  74028 25304 ?        S    09:49   0:20  \_ /usr/local/mesh/meshagent --installedByUser=0

root     22336  0.0  0.0  61784 23544 ?        Ssl  Sep10   1:26 /usr/local/mesh/meshagent --installedByUser=0
lguerby  26579  0.0  0.0      0     0 ?        Z    10:02   0:00  \_ [meshagent] <defunct>

root     22336  0.0  0.0  62848 24704 ?        Ssl  Sep10   1:26 /usr/local/mesh/meshagent --installedByUser=0
lguerby  27729  2.7  0.0  73944 25116 ?        R    10:05   0:00  \_ /usr/local/mesh/meshagent --installedByUser=0

I will test soon with an unloaded RHEL 7 VM to see if the heavy simulation process (still running) have an effect.

krayon007 commented 3 years ago

One thing you can try is to run the following command in the console tab: eval "require('user-sessions').on('changed', function(){sendConsoleText(require('user-sessions').consoleUid());});"

This will say cyclic error when it returns, but that is ok... This will make it so it displays login/logout events, and should show either the GDM uid, or the user uid, depending if the console is logged in or not. Let me know if this properly detects logging in from the login screen.

guerby commented 3 years ago

It loggued "1002" which is the user id of lguerby (the login I use on this VM)

guerby commented 3 years ago

And 42 (gdm) when logout (and meshcentral successfully got back to GDM screen by itself)

guerby commented 2 years ago

Tested today with 0.9.58 and I still have KVM child exit message.

krayon007 commented 2 years ago

Interesting, thanks. I'll have to do more testing to see how I can coerce the race to happen.

NiceGuyIT commented 10 months ago

Here is a list of "KVM Child process has unexpectedly exited" issues for cross referencing.