[Thin Templates Humble] VNC doesn't load by default

pawanw17 commented 5 months ago

It loads correctly after pressing connect.

pawanw17 commented 5 months ago

I saw this particular error, could be useful

Error: unable to open display :0
Error: Command 'glxinfo | grep direct' returned non-zero exit status 1.

jmplaza commented 5 months ago

Interesting @pawanw17 , that seems a problem setting the X display. As far as I remember, inside docker, 2 vnc DISPLAYS are set, just to share them with the browser: there the console and the gzview graphical outputs. Two vnc servers run there and two vnc clients (running at the exercise web page) connect to them. They are vnc DISPLAYS and after or before creating them, the GPU acceleration capabilities available inside docker are checked with "glxinfo | grep direct".

pawanw17 commented 5 months ago

yes that seems to be correct, but I think the issue is due to the server starting after the client

jmplaza commented 5 months ago

Yes, it sounds likely. That would explain that sometimes the viewers work fine and others they fail, even on the same computet. We should move towards a "monitorized sequence of steps" when launching all the visualization steps. Monitoring each step should enforce the logical order. And regardkng the vnc viewers we could add some re-try attempts in case of unsuccessful connections. What do you think @pawanw17 ?

pawanw17 commented 5 months ago

@jmplaza I agree, sometimes even in master I have to manually connect the vnc. I think we should do both, have a logical order and reattempt to connect on failure as you said

pawanw17 commented 5 months ago

Another issue I am noticing is that gpu acceleration never seems to work on my computer (both on master and thin templates) even though I have a discrete gpu, the vnc always shows a black space

jmplaza commented 5 months ago

Yes, currently the GPU support works only on certain cases, and always with Linux hosts. @dpascualhe is studying the support for Windows hosts. https://github.com/JdeRobot/RoboticsAcademy/issues/902

When the RADI is launched with GPU acceleration support, the RAM inside the RADI checks for available GPU support using /dev/dri from the docker container. Currently it works on most cases with nVidia cards, and even with integrated Intel GPUs.... The RAM allocates manually the tasks (console, gzview, gazebo, user application) on several ad-hoc shells with or without GPU acceleration depending on the detected GPUs (with /dev/dri). But this allocation can be made more general to cover all the cases on the host machine (no GPU at all, just an integrated GPU, just a nVidia GPU, both integrated+nVidia...). We have to improve that too, but it deserves a specific issue.

OscarMrZ commented 5 months ago

Yes, it sounds likely. That would explain that sometimes the viewers work fine and others they fail, even on the same computet. We should move towards a "monitorized sequence of steps" when launching all the visualization steps. Monitoring each step should enforce the logical order. And regardkng the vnc viewers we could add some re-try attempts in case of unsuccessful connections. What do you think @pawanw17 ?

Totally agree, the problem is a race condition in which the server usually starts after the client. The visualization_ready message is sent when the on_prepare_visualization function ends, but the problem is that this doesn't necessarily mean that the VNC server is ready. Regarding this, I think there are two important points:

The mechanism in the frontend is already implemented, it receives the message indicating that the visualization is ready, but generally it is a lie. Currently the react viewer of the console implements a time delay, trying to connect 1 second after receiving the message, which is generally insufficient. Incrementing this time and adding the same mechanism to the gazebo viewer, I managed to make it work properly. However, I think this is a bad solution that would lead to the same problem in slower computers!
There isn't a proper monitorization for launching the visualization, currently it just launches the console/gazebo and sends the visualization_ready message, without checking when they are already up.

@jmplaza regarding the suggestion of implementing a retry mechanism, this should definitely be included, but only to avoid real connection issues.

jmplaza commented 5 months ago

Thinking about it, in the long run maybe we should include these states in the RAM's FSM (visualization-ready, visualization-connected...) and in the corresponding JS library running in the browser (which allows the browser to know the RAM state to some extent). With this, the browser would not try to connect to the vnc servers until the RAM notifies the "visualization-ready" state... And we could also integrate into the system the response to events such as "vnc disconnection" or "vnc failure...", for instance retrying connection.

Currently, in the 3-step RAM the visualization "states" or events are not explicitly considered, the visualization is just launched with two ways: (a) several vnc connections (console, gazebo viewer, general X display) and (b) ad-hoc visualization (such as the image frames at FollowLine exercise, the city map in Global Navigation, the house map in Basic Vacuum Cleaner...) through the main websocket. Both (a) and (b) are bidirectional. Only the regular path of events is supported, and it sometimes fails as you have seen.

While this could be the long term solution, for now I would focus just on monitoring the real state of the vnc-connection from the browser, to be aware in case of failure or disconnection and on programming the regular response: a new connection attempt (on the browser) or launching the VNC servers (on the RAM side).

BTW currently we have 2 or 3 vnc servers and their corresponding DISPLAYS and connections. I think we could simplify that to a single vnc server and a single DISPLAY in which several applications may show their graphical outputs. And the vnc-clients can only "subscribe" to certain subwindow of that DISPLAY. This would simplify the number of connections between the RADI and the browser, so we would require less ports when launching the RADI, and maybe less computing would be needed (we have to check).

OscarMrZ commented 5 months ago

@jmplaza I think that would be great for future updates of the RAM! The part of the visualization-ready state is already implemented but not working properly.

Happily, most of the structures for this monitorization are already implemented and almost working. The RAM receives the prepare_visualization, and executes all the visualization modules, spawning the necessary threads. After some checks, the launching process finishes and the visualization-ready message is sent. Only then the browser tries to connect to the VNC server.

I think the solution can be pretty straightforward: add a blocking check (there is already a blocking check for gazebo status) for the external port of the VNC server. Only then the servers are really up, and the viz process can carry on. I will upload a PR with the suggested solution

OscarMrZ commented 5 months ago

Solved at #125

JdeRobot / RoboticsApplicationManager

[Thin Templates Humble] VNC doesn't load by default #120