RemoteConnectionManager / RCM

RemoteConnectionManager code and binary components
GNU Lesser General Public License v3.0
3 stars 4 forks source link

list session crash #32

Open luigi-calori opened 4 years ago

luigi-calori commented 4 years ago

When there is a session on a login node that is no more available or hangs, then either hangs

2020-03-11 17:43:59,714 - INFO - Logging...

2020-03-11 17:43:59,815 - INFO - On rcm.galileo.cineca.it run: module load rcm; python $RCM_HOME/bin/server/rcm_new_server.py --command=config --build_platform='{"platform": "linux_64bit_ubuntu18", "version": "v0.1.1-79-gd868ad9", "checksum": "e29fd23030dcc0ecaccaacd169e89b7b", "client_info": {"screen_width": 1920, "screen_height": 1080}}'

or error like

2020-03-11 15:39:00,260 - INFO - Welcome to RCM!

2020-03-11 15:39:05,768 - INFO - Logging...

2020-03-11 15:39:05,774 - INFO - On login.galileo.cineca.it run: module load rcm; python $RCM_HOME/bin/server/rcm_new_server.py --command=config --build_platform='{"platform": "linux_64bit_ubuntu18", "version": "v0.1.1-2-g1a329c0", "checksum": "139c861df6b53bd2e0040a8d35663573", "client_info": {"screen_width": 1920, "screen_height": 1080}}'

2020-03-11 15:39:11,602 - INFO - Logged as clatini0 to login.galileo.cineca.it

2020-03-11 15:39:11,603 - INFO - Checking if a new client version is available...

2020-03-11 15:39:11,784 - INFO - The client is up-to-date

2020-03-11 15:39:11,785 - INFO - On login.galileo.cineca.it run: module load rcm; python $RCM_HOME/bin/server/rcm_new_server.py --command=loginlist --subnet='130.186.17'

2020-03-11 15:39:15,201 - INFO - On login03.galileo.cineca.it run: module load rcm; python $RCM_HOME/bin/server/rcm_new_server.py --command=list --subnet='130.186.17'

2020-03-11 15:39:25,212 - ERROR - Failed to reload the display sessions

2020-03-11 15:39:25,214 - ERROR - timed out

2020-03-11 15:39:25,214 - ERROR - Exception occurred Traceback (most recent call last): File "client/logic/manager.py", line 117, in prex File "site-packages/paramiko/client.py", line 343, in connect File "site-packages/paramiko/util.py", line 280, in retry_on_signal File "site-packages/paramiko/client.py", line 343, in socket.timeout: timed out During handling of the above exception, another exception occurred: Traceback (most recent call last): File "client/gui/thread.py", line 78, in run File "client/logic/manager.py", line 162, in list File "client/logic/rcm_protocol_client.py", line 52, in wrapper File "client/logic/manager.py", line 124, in prex RuntimeError: timed out

lferraro commented 4 years ago

Stesso problema si è verificato quando hanno dismesso il cluster HPC3 (su cui probabilmente avevo una creato una sessione, forse anche già chiusa, ma i cui file erano presenti nella stessa directory .rcm perchè la $HOME è condivisa tra i cluster).

2020-03-17 12:21:10,979 - INFO - Welcome to RCM! 2020-03-17 12:21:46,485 - INFO - Logging... 2020-03-17 12:21:46,523 - INFO - On login06-hpc4.eni.cineca.it run: module load rcm; python $RCM_HOME/bin/server/rcm_new_server.py --command=config --build_platform='{"platform": "win32_64bit", "version": "v0.1.1-2-g1a329c0", "checksum": "79fc7ac538d174ffeb31c3602bdda42a", "client_info": {"screen_width": 1920, "screen_height": 1080}}' 2020-03-17 12:21:47,192 - INFO - Logged as cibo13 to login06-hpc4.eni.cineca.it 2020-03-17 12:21:47,196 - INFO - Checking if a new client version is available... 2020-03-17 12:21:47,263 - INFO - The client is up-to-date 2020-03-17 12:21:47,264 - INFO - On login06-hpc4.eni.cineca.it run: module load rcm; python $RCM_HOME/bin/server/rcm_new_server.py --command=loginlist --subnet='130.186.14' 2020-03-17 12:21:48,026 - INFO - On login06-hpc3.eni.cineca.it run: module load rcm; python $RCM_HOME/bin/server/rcm_new_server.py --command=list --subnet='130.186.14' 2020-03-17 12:21:48,539 - ERROR - Failed to reload the display sessions 2020-03-17 12:21:48,540 - ERROR - Authentication failed. 2020-03-17 12:21:48,540 - ERROR - Exception occurred Traceback (most recent call last): File "client\logic\manager.py", line 117, in prex File "site-packages\paramiko\client.py", line 437, in connect File "site-packages\paramiko\client.py", line 749, in _auth File "site-packages\paramiko\client.py", line 736, in _auth File "site-packages\paramiko\transport.py", line 1436, in auth_password File "site-packages\paramiko\auth_handler.py", line 236, in wait_for_response paramiko.ssh_exception.AuthenticationException: Authentication failed. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "client\gui\thread.py", line 78, in run File "client\logic\manager.py", line 162, in list File "client\logic\rcm_protocol_client.py", line 52, in wrapper File "client\logic\manager.py", line 124, in prex RuntimeError: Authentication failed.

lferraro commented 4 years ago

WORKAROUND: in attesa del fix di questo bug, è possibile cancellare le informazioni della sessione non più raggiungibile (solo quella sessione, non tutto) che si trovano nella cartella $HOME/.rcm (Unix) o in C:/Users//.rcm (Windows).