Closed ShabirK21 closed 10 months ago
When this happens it is typically Robomaker that is not starting up. (Potentially due to wrong cofiguration.) Do dr-stop-training
, dr-start-training
and dr-logs-robomaker
to have a better look at it.
09/07/2023 10:52:40 passing arg to libvncserver: -rfbport 09/07/2023 10:52:40 passing arg to libvncserver: 5900 09/07/2023 10:52:40 x11vnc version: 0.9.13 lastmod: 2011-08-10 pid: 62 09/07/2023 10:52:40 09/07/2023 10:52:40 wait_for_client: WAIT:0 09/07/2023 10:52:40 09/07/2023 10:52:40 initialize_screen: fb_depth/fb_bpp/fb_Bpl 24/32/2560 09/07/2023 10:52:40 09/07/2023 10:52:40 Listening for VNC connections on TCP port 5900 09/07/2023 10:52:40 Listening for VNC connections on TCP6 port 5900 09/07/2023 10:52:40 listen6: bind: Address already in use 09/07/2023 10:52:40 Not listening on IPv6 interface. 09/07/2023 10:52:40
The VNC desktop is: 06c24141a045:0 PORT=5900 JWM: warning: /etc/jwm/system.jwmrc[6]: invalid include: /etc/jwm/debian-menu IP: 10.0.1.9 172.19.0.6 10.0.0.6 (06c24141a045) 10:52:42 INFO:[DeepRacerNodeMonitor]: NodeMonitor started running 10:52:42 ERROR:[NodeMonitor]: Rosnode threw exception. Master node could be dead 10:52:43 ERROR:[NodeMonitor]: Rosnode threw exception. Master node could be dead 10:52:44 ERROR:[NodeMonitor]: Rosnode threw exception. Master node could be dead 10:52:45 ERROR:[NodeMonitor]: Rosnode threw exception. Master node could be dead 10:52:46 ERROR:[NodeMonitor]: Rosnode threw exception. Master node could be dead 10:52:47 ERROR:[NodeMonitor]: Rosnode threw exception. Master node could be dead 10:52:48 ERROR:[NodeMonitor]: Rosnode threw exception. Master node could be dead 10:52:49 ERROR:[NodeMonitor]: Rosnode threw exception. Master node could be dead
Not able to decipher what is going wrong here. Arch Linux is not supported -- so could be the reason. Is suggest you reach out to the community http://join.deepracing.io/ for support in setting things up and debugging.
system.env
DR_CLOUD=local DR_AWS_APP_REGION=us-east-1 DR_UPLOAD_S3_PROFILE=default DR_UPLOAD_S3_BUCKET=not-defined DR_UPLOAD_S3_ROLE=to-be-defined DR_LOCAL_S3_BUCKET=bucket DR_LOCAL_S3_PROFILE=minio DR_GUI_ENABLE=False DR_KINESIS_STREAM_NAME= DR_CAMERA_MAIN_ENABLE=True DR_CAMERA_SUB_ENABLE=False DR_CAMERA_KVS_ENABLE=True DR_SAGEMAKER_IMAGE=5.1.0-gpu DR_ROBOMAKER_IMAGE=5.1.0-cpu-avx2 DR_MINIO_IMAGE=latest DR_ANALYSIS_IMAGE=cpu DR_COACH_IMAGE=5.1.0 DR_WORKERS=1 DR_ROBOMAKER_MOUNT_LOGS=False DR_CLOUD_WATCH_ENABLE=False DR_DOCKER_STYLE=swarm DR_HOST_X=False DR_WEBVIEWER_PORT=8100
when i run "dr-start-training" sagemaker starts but i get stuck at "DoorMan: installing SIGINT, SIGTERM" and the training does not start. I tried restarting docker and my laptop. I'm using Arch Linux for this.