Azure / az-hop

The Azure HPC On-Demand Platform provides an HPC Cluster Ready solution
https://azure.github.io/az-hop/
MIT License
65 stars 53 forks source link

vmd app broken on azhpc:azhop-desktop:almalinux-8_7:latest #1465

Closed ltalirz closed 1 year ago

ltalirz commented 1 year ago

Version

1.0.34

In what area(s)?

/area administration /area ansible /area autoscaling /area configuration /area cyclecloud /area documentation /area image /area job-scheduling /area monitoring /area ood /area remote-visualization /area user-management

Expected Behavior

Starting the VMD app works

Actual Behavior

Starting VMD app does not work.

In particular, the error log contains

+ xfce4-terminal -e /anfhome/apps/vmd/bin/vmd -T 'VMD Terminal' --disable-server
ERROR: Collection default cannot be found

This error has already been noticed in a few other OpenOnDemand threads (but without resolution)

https://discourse.openondemand.org/t/matlab-window-problem/2294 https://discourse.openondemand.org/t/session-briefly-starts-then-immediately-crashes-user-authentication-problem/2268/2

Steps to Reproduce the Problem

Enable VMD app, install VMD, start app

Solution/workaround

The cause of the problem is likely the custom xfce-related setup in https://github.com/Azure/az-hop/blob/55ef6780cda7e0820381696f76b8dc5feb0fdd56/playbooks/roles/ood-applications/files/bc_vmd/template/script.sh.erb#L15-L25

My question here: rather than maintaining a custom way for each app to start the xfce desktop in the background, should we not be reusing the setup from the bc_desktop app (which works fine also on Alma)?

I "fixed" the problem by

xfwm4 --compositor=off --sm-client-disable & xfce4-panel --sm-client-disable &


<!-- How can a maintainer reproduce this issue (be detailed) -->

cc @matt-chan
ltalirz commented 1 year ago

P.S. Reduced number of changes:

However, it is necessary to source the xfce.sh script in the same process as the one that starts vmd.

If I leave the xfce.sh from bc_desktop untouched and simply launch it in a subshell, VMD no longer starts properly. Log:

Setting VNC password...
Starting VNC server...

Desktop 'TurboVNC: viz-1:1 (a-ctalirz)' started on display viz-1:1

Log file is vnc.log
Successfully started VNC server on viz-1:5901...
Script starting...
Starting websocket server...
+ xfce4-terminal -e /anfhome/apps/vmd/bin/vmd -T 'VMD Terminal' --disable-server
Failed to connect to session manager: Failed to connect to the session manager: SESSION_MANAGER environment variable not defined
WebSocket server settings:
  - Listen on :61006
  - No SSL/TLS support (no cert file)
  - Backgrounding (daemon)
Scanning VNC log file for user authentications...
Generating connection YAML file...

(xfwm4:37177): xfwm4-WARNING **: 13:48:13.559: Unsupported GL renderer (llvmpipe (LLVM 14.0.6, 256 bits)).

** (xfce4-screensaver:37271): WARNING **: 13:48:14.124: screensaver already running in this session

** (xfdesktop:37243): WARNING **: 13:48:14.141: Failed to set the background '/usr/share/backgrounds/images/default.png': GDBus.Error:org.freedesktop.DBus.Error.InvalidArgs: No such interface 'org.freedesktop.DisplayManager.AccountsService'

** (wrapper-2.0:37252): WARNING **: 13:48:14.332: No outputs have backlight property

** (wrapper-2.0:37251): WARNING **: 13:48:14.361: Binding 'XF86AudioMicMute' failed!

(wrapper-2.0:37251): pulseaudio-plugin-WARNING **: 13:48:14.361: Could not have grabbed volume control keys. Is another volume control application (xfce4-volumed) running?

(wrapper-2.0:37251): libnotify-WARNING **: 13:48:14.365: Failed to connect to proxy

(wrapper-2.0:37251): Gtk-WARNING **: 13:48:14.402: Negative content width -3 (allocation 1, extents 2x2) while allocating gadget (node button, owner PulseaudioButton)

(wrapper-2.0:37252): Gtk-WARNING **: 13:48:14.414: Negative content width -3 (allocation 1, extents 2x2) while allocating gadget (node button, owner PowerManagerButton)

(wrapper-2.0:37275): Gtk-WARNING **: 13:48:14.507: Negative content width -1 (allocation 1, extents 1x1) while allocating gadget (node button, owner XfceArrowButton)
Setting VNC password...
Generating connection YAML file...

(wrapper-2.0:37251): pulseaudio-plugin-WARNING **: 13:48:20.959: Disconected from the PulseAudio server. Attempting to reconnect in 5 seconds.
/anfhome/a-ctalirz/ondemand/data/sys/dashboard/batch_connect/sys/bc_vmd/output/f51c5f0c-af0c-432a-8956-0cbbfb318f31/script.sh: line 41: 37068 Terminated              xfce4-terminal -e "$VMD_HOME_DIR/bin/vmd" -T "VMD Terminal" --disable-server
Cleaning up...
Killing Xvnc process ID 37043
xpillons commented 1 year ago

@ltalirz can you please suggest a PR ?

xpillons commented 1 year ago

@ltalirz can you please share you whole script.sh.erb as I'm not able to make it works with your updates. I can repro if I don't change the existing one.

ltalirz commented 1 year ago

hey @xpillons , just opened https://github.com/Azure/az-hop/pull/1736